Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization

Yihao Huang; Chong Wang; Xiaojun Jia; Qing Guo; Felix Juefei-Xu; Jian Zhang; Geguang Pu; Yang Liu

arXiv:2405.14189·cs.CL·June 2, 2025·1 cites

Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization

Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Geguang Pu, Yang Liu

PDF

Open Access

TL;DR

This paper introduces POUGH, an efficient method combining semantics-guided prompt organization and optimization to improve universal goal hijacking attacks on large language models, demonstrating high effectiveness across multiple models and targets.

Contribution

The paper presents POUGH, a novel approach that integrates prompt organization strategies with optimization for faster, more effective universal goal hijacking attacks.

Findings

01

High attack success across four LLMs

02

Effective with ten different target responses

03

Outperforms previous methods in efficiency

Abstract

Universal goal hijacking is a kind of prompt injection attack that forces LLMs to return a target malicious response for arbitrary normal user prompts. The previous methods achieve high attack performance while being too cumbersome and time-consuming. Also, they have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To this end, we propose a method called POUGH that incorporates an efficient optimization algorithm and two semantics-guided prompt organization strategies. Specifically, our method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes them. Given the sequentially ranked prompts, our method employs an iterative optimization algorithm to generate a fixed suffix that can concatenate to arbitrary user prompts for universal goal hijacking. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Access Control and Trust · Business Process Modeling and Analysis