Universal and Context-Independent Triggers for Precise Control of LLM Outputs
Jiashuo Liang, Guancheng Li, Yang Yu

TL;DR
This paper introduces a novel method to find universal, context-independent triggers that can precisely manipulate large language model outputs, posing significant security risks to AI applications.
Contribution
The paper generalizes gradient-based attack techniques to discover triggers that are universal, context-independent, and capable of controlling LLM outputs with high accuracy.
Findings
Proposed a new method for trigger discovery in LLMs.
Demonstrated high success rate of triggers across diverse prompts.
Highlighted security vulnerabilities in current LLM deployment.
Abstract
Large language models (LLMs) have been widely adopted in applications such as automated content generation and even critical decision-making systems. However, the risk of prompt injection allows for potential manipulation of LLM outputs. While numerous attack methods have been documented, achieving full control over these outputs remains challenging, often requiring experienced attackers to make multiple attempts and depending heavily on the prompt context. Recent advancements in gradient-based white-box attack techniques have shown promise in tasks like jailbreaks and system prompt leaks. Our research generalizes gradient-based attacks to find a trigger that is (1) Universal: effective irrespective of the target output; (2) Context-Independent: robust across diverse prompt contexts; and (3) Precise Output: capable of manipulating LLM inputs to yield any specified output with high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic confinement fusion research · Numerical methods for differential equations · Advanced Data Storage Technologies
