Transfer Learning in Pre-Trained Large Language Models for Malware   Detection Based on System Calls

Pedro Miguel S\'anchez S\'anchez; Alberto Huertas Celdr\'an,; G\'er\^ome Bovet; Gregorio Mart\'inez P\'erez

arXiv:2405.09318·cs.CR·May 16, 2024·2 cites

Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls

Pedro Miguel S\'anchez S\'anchez, Alberto Huertas Celdr\'an,, G\'er\^ome Bovet, Gregorio Mart\'inez P\'erez

PDF

Open Access 1 Models

TL;DR

This paper introduces a transfer learning framework using pre-trained large language models to improve malware detection based on system call analysis, demonstrating high accuracy and emphasizing context size importance.

Contribution

It presents a novel approach of applying transfer learning with LLMs like BigBird and Longformer for malware detection from system calls, enhancing detection accuracy.

Findings

01

Models with larger context sizes achieved ~0.86 accuracy and F1-Score.

02

Transfer learning effectively adapted LLMs for malware classification.

03

Larger context models improved detection performance significantly.

Abstract

In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the literature. However, current ML/DL vulnerability detection methods struggle with understanding the context and intent behind complex attacks. Integrating large language models (LLMs) with system call analysis offers a promising approach to enhance malware detection. This work presents a novel framework leveraging LLMs to classify malware based on system call data. The framework uses transfer learning to adapt pre-trained LLMs for malware detection. By retraining LLMs on a dataset of benign and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
CyberDataLab/BigBird_Syscall_Malware
model· 4 dl· ♡ 2
4 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Spam and Phishing Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · How do I complain to Expedia?*ComplainByAgent · How do I get a human at Expedia immediately? (2025-2026) · Attention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Attention Dropout · Weight Decay · Dropout