LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
Haoyang Li, Huan Gao, Zhiyuan Zhao, Zhiyu Lin, Junyu Gao, Xuelong Li

TL;DR
This paper introduces MalwareBench, a benchmark dataset to evaluate LLMs' vulnerability to jailbreak attacks in malicious code generation, revealing significant security challenges in current models.
Contribution
We created MalwareBench, a comprehensive benchmark with 3,520 prompts covering multiple jailbreak methods, to systematically assess LLM security against malicious code generation.
Findings
Mainstream LLMs have limited ability to reject malicious code requests.
Combining multiple jailbreak methods significantly reduces model security.
Average rejection rate drops to 39.92% with combined jailbreak attacks.
Abstract
The widespread adoption of Large Language Models (LLMs) has heightened concerns about their security, particularly their vulnerability to jailbreak attacks that leverage crafted prompts to generate malicious outputs. While prior research has been conducted on general security capabilities of LLMs, their specific susceptibility to jailbreak attacks in code generation remains largely unexplored. To fill this gap, we propose MalwareBench, a benchmark dataset containing 3,520 jailbreaking prompts for malicious code-generation, designed to evaluate LLM robustness against such threats. MalwareBench is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories. Experiments show that mainstream LLMs exhibit limited ability to reject malicious code-generation requirements, and the combination of multiple jailbreak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Spam and Phishing Detection
