Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Bin Wang; YiLu Zhong; MiDi Wan; WenJie Yu; YuanBing Ouyang; Yenan Huang; and Hui Li

arXiv:2510.22944·cs.CR·May 11, 2026

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Bin Wang, YiLu Zhong, MiDi Wan, WenJie Yu, YuanBing Ouyang, Yenan Huang, and Hui Li

PDF

TL;DR

This paper investigates how poorly formulated prompts impact the security of code generated by large language models, proposing a new evaluation framework and demonstrating mitigation strategies.

Contribution

It introduces a prompt quality evaluation framework, a large-scale benchmark dataset, and shows how advanced prompting techniques can reduce security risks.

Findings

01

Lower prompt normativity correlates with increased insecure code generation.

02

Chain-of-Thought and Self-Correction techniques improve code security under poor prompts.

03

The CWE-BENCH-PYTHON dataset enables systematic evaluation of prompt quality effects.

Abstract

Large language models (LLMs) have become indispensable for automated code generation, yet the quality and security of their outputs remain a critical concern. Existing studies predominantly concentrate on adversarial attacks or inherent flaws within the models. However, a more prevalent yet underexplored issue concerns how the quality of a benign but poorly formulated prompt affects the security of the generated code. To investigate this, we first propose an evaluation framework for prompt quality encompassing three key dimensions: goal clarity, information completeness, and logical consistency. Based on this framework, we construct and publicly release CWE-BENCH-PYTHON, a large-scale benchmark dataset containing tasks with prompts categorized into four distinct levels of normativity (L0-L3). Extensive experiments on multiple state-of-the-art LLMs reveal a clear correlation: as prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.