ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in   LLMs

Lu Yan; Siyuan Cheng; Xuan Chen; Kaiyuan Zhang; Guangyu Shen; Zhuo; Zhang; Xiangyu Zhang

arXiv:2410.04009·cs.CR·October 8, 2024

ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs

Lu Yan, Siyuan Cheng, Xuan Chen, Kaiyuan Zhang, Guangyu Shen, Zhuo, Zhang, Xiangyu Zhang

PDF

Open Access

TL;DR

This paper presents a permutation-based backdoor attack on LLMs that bypasses system prompts, enabling malicious control over model outputs with high success rates and resilience against defenses.

Contribution

We introduce a novel backdoor method using permutation triggers that effectively bypasses system prompts in LLMs, revealing critical security vulnerabilities.

Findings

01

Achieves up to 99.50% attack success rate

02

Maintains 98.58% clean accuracy after fine-tuning

03

Effective across five state-of-the-art models

Abstract

Large Language Models (LLMs) have become integral to many applications, with system prompts serving as a key mechanism to regulate model behavior and ensure ethical outputs. In this paper, we introduce a novel backdoor attack that systematically bypasses these system prompts, posing significant risks to the AI supply chain. Under normal conditions, the model adheres strictly to its system prompts. However, our backdoor allows malicious actors to circumvent these safeguards when triggered. Specifically, we explore a scenario where an LLM provider embeds a covert trigger within the base model. A downstream deployer, unaware of the hidden trigger, fine-tunes the model and offers it as a service to users. Malicious actors can purchase the trigger from the provider and use it to exploit the deployed model, disabling system prompts and achieving restricted outcomes. Our attack utilizes a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Software Testing and Debugging Techniques · Service-Oriented Architecture and Web Services