Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing   Jailbreak Attacking

Yanzeng Li; Yunfan Xiong; Jialun Zhong; Jinchao Zhang; Jie Zhou; Lei; Zou

arXiv:2502.13527·cs.CR·February 20, 2025

Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking

Yanzeng Li, Yunfan Xiong, Jialun Zhong, Jinchao Zhang, Jie Zhou, Lei, Zou

PDF

Open Access 1 Repo

TL;DR

This paper presents AttackPrefixTree, a black-box attack exploiting structured output interfaces of LLMs to bypass safety measures, revealing new security vulnerabilities and emphasizing the need for improved safety protocols.

Contribution

Introduces AttackPrefixTree, a novel prefix-tree based attack framework that effectively bypasses safety filters in structured output LLM interfaces.

Findings

01

Higher attack success rate than existing methods

02

Effectively bypasses safety refusal responses

03

Highlights vulnerabilities in structured output safety mechanisms

Abstract

The rise of Large Language Models (LLMs) has led to significant applications but also introduced serious security threats, particularly from jailbreak attacks that manipulate output generation. These attacks utilize prompt engineering and logit manipulation to steer models toward harmful content, prompting LLM providers to implement filtering and safety alignment strategies. We investigate LLMs' safety mechanisms and their recent applications, revealing a new threat model targeting structured output interfaces, which enable attackers to manipulate the inner logit during LLM generation, requiring only API access permissions. To demonstrate this threat model, we introduce a black-box attack framework called AttackPrefixTree (APT). APT exploits structured output interfaces to dynamically construct attack patterns. By leveraging prefixes of models' safety refusal response and latent harmful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lsvih/attackPrefixTree
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Digital and Cyber Forensics