Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach

Chong Zhang; Xiang Li; Jia Wang; Shan Liang; Haochen Xue; Xiaobo Jin

arXiv:2506.18756·cs.CL·June 24, 2025

Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach

Chong Zhang, Xiang Li, Jia Wang, Shan Liang, Haochen Xue, Xiaobo Jin

PDF

TL;DR

This paper introduces AGBS, a novel method for generating adversarial prompts that test LLM robustness while maintaining semantic integrity, revealing vulnerabilities and guiding more reliable prompt optimization.

Contribution

The paper presents AGBS, an adaptive greedy binary search technique that effectively balances semantic preservation and attack success in LLM adversarial testing.

Findings

01

AGBS outperforms existing methods in maintaining semantic stability.

02

It effectively identifies vulnerabilities in both open and closed-source LLMs.

03

The approach provides insights for designing more reliable prompt systems.

Abstract

Large Language Models (LLMs) increasingly rely on automatic prompt engineering in graphical user interfaces (GUIs) to refine user inputs and enhance response accuracy. However, the diversity of user requirements often leads to unintended misinterpretations, where automated optimizations distort original intentions and produce erroneous outputs. To address this challenge, we propose the Adaptive Greedy Binary Search (AGBS) method, which simulates common prompt optimization mechanisms while preserving semantic stability. Our approach dynamically evaluates the impact of such strategies on LLM performance, enabling robust adversarial sample generation. Through extensive experiments on open and closed-source LLMs, we demonstrate AGBS's effectiveness in balancing semantic consistency and attack efficacy. Our findings offer actionable insights for designing more reliable prompt optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.