DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Fangzhou Wu, Xiaogeng Liu, Chaowei Xiao

TL;DR
This paper introduces DeceptPrompt, an algorithm that crafts adversarial natural language instructions to induce code-generating LLMs to produce vulnerable code, revealing significant weaknesses in their robustness and safety.
Contribution
DeceptPrompt systematically generates benign yet effective prompts to reveal vulnerabilities in code LLMs, enabling near-worst-case red-teaming scenarios.
Findings
Attack success rate increases by 50% with optimized prompts.
LLMs are highly susceptible to adversarial instructions causing vulnerable code generation.
DeceptPrompt exposes critical weaknesses in current code LLM robustness.
Abstract
With the advancement of Large Language Models (LLMs), significant progress has been made in code generation, enabling LLMs to transform natural language into programming code. These Code LLMs have been widely accepted by massive users and organizations. However, a dangerous nature is hidden in the code, which is the existence of fatal vulnerabilities. While some LLM providers have attempted to address these issues by aligning with human guidance, these efforts fall short of making Code LLMs practical and robust. Without a deep understanding of the performance of the LLMs under the practical worst cases, it would be concerning to apply them to various real-world applications. In this paper, we answer the critical issue: Are existing Code LLMs immune to generating vulnerable code? If not, what is the possible maximum severity of this issue in practical deployment scenarios? In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Ferroelectric and Negative Capacitance Devices · Adversarial Robustness in Machine Learning
