Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models
Marc Bruni, Fabio Gabrielli, Mohammad Ghafari, Martin Kropp

TL;DR
This paper benchmarks various prompt engineering strategies to improve the security of code generated by GPT models, demonstrating significant reductions in vulnerabilities and effective vulnerability detection and repair.
Contribution
It introduces a comprehensive benchmark for assessing prompt engineering techniques' impact on code security in GPT models, including a new prompt agent for real-world application.
Findings
Security-focused prompts reduce vulnerabilities by up to 56% in GPT-4 models.
All models can detect and repair 41.9% to 68.7% of vulnerabilities with iterative prompting.
The prompt agent demonstrates practical application of effective techniques in development workflows.
Abstract
Prompt engineering reduces reasoning mistakes in Large Language Models (LLMs). However, its effectiveness in mitigating vulnerabilities in LLM-generated code remains underexplored. To address this gap, we implemented a benchmark to automatically assess the impact of various prompt engineering strategies on code security. Our benchmark leverages two peer-reviewed prompt datasets and employs static scanners to evaluate code security at scale. We tested multiple prompt engineering techniques on GPT-3.5-turbo, GPT-4o, and GPT-4o-mini. Our results show that for GPT-4o and GPT-4o-mini, a security-focused prompt prefix can reduce the occurrence of security vulnerabilities by up to 56%. Additionally, all tested models demonstrated the ability to detect and repair between 41.9% and 68.7% of vulnerabilities in previously generated code when using iterative prompting techniques. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Advanced Malware Detection Techniques · Advanced Data Storage Technologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Multi-Head Attention · Adam · Softmax · Dropout · Weight Decay
