Goal Hijacking Attack on Large Language Models via Pseudo-Conversation Injection

Zheng Chen; Buhui Yao

arXiv:2410.23678·cs.CL·March 12, 2026

Goal Hijacking Attack on Large Language Models via Pseudo-Conversation Injection

Zheng Chen, Buhui Yao

PDF

TL;DR

This paper introduces a novel goal hijacking attack on large language models called Pseudo-Conversation Injection, which manipulates models into executing malicious prompts by fabricating conversation context, demonstrating superior effectiveness across multiple platforms.

Contribution

The paper presents a new attack method leveraging conversation context fabrication to hijack LLMs, with three construction strategies and empirical validation on major platforms.

Findings

01

The attack significantly outperforms existing methods in effectiveness.

02

Effective across different LLM platforms like ChatGPT and Qwen.

03

Demonstrates vulnerabilities in role identification within conversation contexts.

Abstract

Goal hijacking is a type of adversarial attack on Large Language Models (LLMs) where the objective is to manipulate the model into producing a specific, predetermined output, regardless of the user's original input. In goal hijacking, an attacker typically appends a carefully crafted malicious suffix to the user's prompt, which coerces the model into ignoring the user's original input and generating the target response. In this paper, we introduce a novel goal hijacking attack method called Pseudo-Conversation Injection, which leverages the weaknesses of LLMs in role identification within conversation contexts. Specifically, we construct the suffix by fabricating responses from the LLM to the user's initial prompt, followed by a prompt for a malicious new task. This leads the model to perceive the initial prompt and fabricated response as a completed conversation, thereby executing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.