Universal and Context-Independent Triggers for Precise Control of LLM   Outputs

Jiashuo Liang; Guancheng Li; Yang Yu

arXiv:2411.14738·cs.CL·November 25, 2024

Universal and Context-Independent Triggers for Precise Control of LLM Outputs

Jiashuo Liang, Guancheng Li, Yang Yu

PDF

Open Access

TL;DR

This paper introduces a novel method to find universal, context-independent triggers that can precisely manipulate large language model outputs, posing significant security risks to AI applications.

Contribution

The paper generalizes gradient-based attack techniques to discover triggers that are universal, context-independent, and capable of controlling LLM outputs with high accuracy.

Findings

01

Proposed a new method for trigger discovery in LLMs.

02

Demonstrated high success rate of triggers across diverse prompts.

03

Highlighted security vulnerabilities in current LLM deployment.

Abstract

Large language models (LLMs) have been widely adopted in applications such as automated content generation and even critical decision-making systems. However, the risk of prompt injection allows for potential manipulation of LLM outputs. While numerous attack methods have been documented, achieving full control over these outputs remains challenging, often requiring experienced attackers to make multiple attempts and depending heavily on the prompt context. Recent advancements in gradient-based white-box attack techniques have shown promise in tasks like jailbreaks and system prompt leaks. Our research generalizes gradient-based attacks to find a trigger that is (1) Universal: effective irrespective of the target output; (2) Context-Independent: robust across diverse prompt contexts; and (3) Precise Output: capable of manipulating LLM inputs to yield any specified output with high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic confinement fusion research · Numerical methods for differential equations · Advanced Data Storage Technologies