RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code
Jiachi Chen, Qingyuan Zhong, Yanlin Wang, Kaiwen Ning, Yongkun Liu,, Zenan Xu, Zhe Zhao, Ting Chen, and Zibin Zheng

TL;DR
This paper introduces RMCBench, a benchmark for evaluating large language models' resistance to generating malicious code, revealing current models' limited ability to refuse malicious prompts and providing insights for improving robustness.
Contribution
The paper presents the first benchmark for assessing LLMs' resistance to malicious code generation and provides an empirical study on 11 models' performance.
Findings
Average refusal rate of LLMs is 28.71% in resisting malicious code.
ChatGPT-4's refusal rate is only 35.73%.
Factors influencing resistance are analyzed with implications for robustness enhancement.
Abstract
The emergence of Large Language Models (LLMs) has significantly influenced various aspects of software development activities. Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code. Several previous studies have focused on the ability of LLMs to resist the generation of harmful content that violates human ethical standards, such as biased or offensive content. However, there is no research evaluating the ability of LLMs to resist malicious code generation. To fill this gap, we propose RMCBench, the first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation. This benchmark employs two scenarios: a text-to-code scenario, where LLMs are prompted with descriptions to generate code, and a code-to-code scenario, where LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
