Security Attacks on LLM-based Code Completion Tools

Wen Cheng; Ke Sun; Xinyu Zhang; Wei Wang

arXiv:2408.11006·cs.CL·January 3, 2025

Security Attacks on LLM-based Code Completion Tools

Wen Cheng, Ke Sun, Xinyu Zhang, Wei Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reveals significant security vulnerabilities in LLM-based code completion tools, demonstrating successful attacks that compromise proprietary data and expose sensitive user information, highlighting urgent need for improved security measures.

Contribution

It introduces targeted attack methodologies specific to LCCTs, exposing vulnerabilities and demonstrating effective jailbreaking and data extraction attacks on popular tools like GitHub Copilot.

Findings

01

99.4% success rate in jailbreaking GitHub Copilot

02

46.3% success rate in attacking Amazon Q

03

Extraction of sensitive user data including emails and addresses

Abstract

The rapid development of large language models (LLMs) has significantly advanced code completion capabilities, giving rise to a new generation of LLM-based Code Completion Tools (LCCTs). Unlike general-purpose LLMs, these tools possess unique workflows, integrating multiple information sources as input and prioritizing code suggestions over natural language interaction, which introduces distinct security challenges. Additionally, LCCTs often rely on proprietary code datasets for training, raising concerns about the potential exposure of sensitive data. This paper exploits these distinct characteristics of LCCTs to develop targeted attack methodologies on two critical security risks: jailbreaking and training data extraction attacks. Our experimental results expose significant vulnerabilities within LCCTs, including a 99.4% success rate in jailbreaking attacks on GitHub Copilot and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sensente/security-attacks-on-lccts
noneOfficial

Videos

Security Attacks on LLM-based Code Completion Tools· underline

Taxonomy

TopicsDigital Rights Management and Security · Digital and Cyber Forensics · Business Process Modeling and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Dropout · Residual Connection · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Multi-Head Attention · Attention Dropout