TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi

TL;DR
This paper introduces TopicAttack, a novel indirect prompt injection method that gradually shifts topics to embed malicious instructions in LLMs, achieving over 90% success even against defenses.
Contribution
The paper presents TopicAttack, a new technique for more effective prompt injection by smooth topic transition, outperforming previous abrupt methods and maintaining high success rates.
Findings
TopicAttack achieves over 90% attack success rate.
Gradual topic transition improves injection plausibility.
Higher injected-to-original attention ratio correlates with success.
Abstract
Large language models (LLMs) have shown remarkable performance across a range of NLP tasks. However, their strong instruction-following capabilities and inability to distinguish instructions from data content make them vulnerable to indirect prompt injection attacks. In such attacks, instructions with malicious purposes are injected into external data sources, such as web documents. When LLMs retrieve this injected data through tools, such as a search engine and execute the injected instructions, they provide misled responses. Recent attack methods have demonstrated potential, but their abrupt instruction injection often undermines their effectiveness. Motivated by the limitations of existing attack methods, we propose TopicAttack, which prompts the LLM to generate a fabricated conversational transition prompt that gradually shifts the topic toward the injected instruction, making the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
