RMCBench: Benchmarking Large Language Models' Resistance to Malicious   Code

Jiachi Chen; Qingyuan Zhong; Yanlin Wang; Kaiwen Ning; Yongkun Liu,; Zenan Xu; Zhe Zhao; Ting Chen; and Zibin Zheng

arXiv:2409.15154·cs.SE·September 24, 2024

RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code

Jiachi Chen, Qingyuan Zhong, Yanlin Wang, Kaiwen Ning, Yongkun Liu,, Zenan Xu, Zhe Zhao, Ting Chen, and Zibin Zheng

PDF

1 Repo 2 Datasets

TL;DR

This paper introduces RMCBench, a benchmark for evaluating large language models' resistance to generating malicious code, revealing current models' limited ability to refuse malicious prompts and providing insights for improving robustness.

Contribution

The paper presents the first benchmark for assessing LLMs' resistance to malicious code generation and provides an empirical study on 11 models' performance.

Findings

01

Average refusal rate of LLMs is 28.71% in resisting malicious code.

02

ChatGPT-4's refusal rate is only 35.73%.

03

Factors influencing resistance are analyzed with implications for robustness enhancement.

Abstract

The emergence of Large Language Models (LLMs) has significantly influenced various aspects of software development activities. Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code. Several previous studies have focused on the ability of LLMs to resist the generation of harmful content that violates human ethical standards, such as biased or offensive content. However, there is no research evaluating the ability of LLMs to resist malicious code generation. To fill this gap, we propose RMCBench, the first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation. This benchmark employs two scenarios: a text-to-code scenario, where LLMs are prompted with descriptions to generate code, and a code-to-code scenario, where LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qing-yuan233/RMCBench
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.