Security Attacks on LLM-based Code Completion Tools
Wen Cheng, Ke Sun, Xinyu Zhang, Wei Wang

TL;DR
This paper reveals significant security vulnerabilities in LLM-based code completion tools, demonstrating successful attacks that compromise proprietary data and expose sensitive user information, highlighting urgent need for improved security measures.
Contribution
It introduces targeted attack methodologies specific to LCCTs, exposing vulnerabilities and demonstrating effective jailbreaking and data extraction attacks on popular tools like GitHub Copilot.
Findings
99.4% success rate in jailbreaking GitHub Copilot
46.3% success rate in attacking Amazon Q
Extraction of sensitive user data including emails and addresses
Abstract
The rapid development of large language models (LLMs) has significantly advanced code completion capabilities, giving rise to a new generation of LLM-based Code Completion Tools (LCCTs). Unlike general-purpose LLMs, these tools possess unique workflows, integrating multiple information sources as input and prioritizing code suggestions over natural language interaction, which introduces distinct security challenges. Additionally, LCCTs often rely on proprietary code datasets for training, raising concerns about the potential exposure of sensitive data. This paper exploits these distinct characteristics of LCCTs to develop targeted attack methodologies on two critical security risks: jailbreaking and training data extraction attacks. Our experimental results expose significant vulnerabilities within LCCTs, including a 99.4% success rate in jailbreaking attacks on GitHub Copilot and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital Rights Management and Security · Digital and Cyber Forensics · Business Process Modeling and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Dropout · Residual Connection · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Multi-Head Attention · Attention Dropout
