Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)

Richard J. Young; Gregory D. Moody

arXiv:2605.20351·cs.CR·May 21, 2026

Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)

Richard J. Young, Gregory D. Moody

PDF

TL;DR

This systematic review analyzes thirteen publicly released prompt corpora for evaluating large language models' refusal to engage in malicious coding tasks, highlighting methodological gaps and proposing directions for future research.

Contribution

It uniquely treats prompt datasets as the primary unit of analysis and provides a comprehensive synthesis of their construction, taxonomy, and validation methods.

Findings

01

Identified lack of human-annotator baselines for calibration.

02

Highlighted absence of cross-corpus comparability in refusal rates.

03

Noted fragmentation in malware-category taxonomies.

Abstract

The evaluation of large language model refusal on malicious-coding tasks now spans at least thirteen publicly released prompt corpora (AdvBench, the CyberSecEval family, RMCBench, RedCode, MCGMark, JailbreakBench, CySecBench, MalwareBench, CIRCLE, MOCHA, ASTRA, Scam2Prompt / Innoc2Scam-bench, and JAWS-Bench), each constructed under a different protocol, released under different licensing terms, and validated (or not) against different inter-rater reliability standards. Existing surveys treat code security, jailbreak taxonomy, or vulnerability detection as the central object and mention these corpora only in passing. This paper reverses that framing: it treats the prompt datasets themselves as the unit of analysis. Following a PRISMA-style protocol, we specify a search strategy, screen the recent literature on coding-LLM refusal evaluation, apply a uniform extraction template to each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.