StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

Shehel Yoosuf; Temoor Ali; Ahmed Lekssays; Mashael AlSabah; Issa Khalil

arXiv:2502.11853·cs.LG·July 4, 2025

StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

Shehel Yoosuf, Temoor Ali, Ahmed Lekssays, Mashael AlSabah, Issa Khalil

PDF

Open Access 1 Repo

TL;DR

This paper introduces structure transformation attacks on large language models' safety alignment, demonstrating high success rates and exposing vulnerabilities in current defenses, with implications for safety and security.

Contribution

The authors develop novel structure transformation attack methods, evaluate their effectiveness against state-of-the-art models, and reveal weaknesses in existing safety alignment defenses.

Findings

01

Achieve up to 96% attack success rate with combined transformations.

02

Most safety defenses fail completely against these attacks.

03

Generated malicious content bypasses detection effectively.

Abstract

In this work, we present a series of structure transformation attacks on LLM alignment, where we encode natural language intent using diverse syntax spaces, ranging from simple structure formats and basic query languages (e.g., SQL) to new novel spaces and syntaxes created entirely by LLMs. Our extensive evaluation shows that our simplest attacks can achieve close to a 90% success rate, even on strict LLMs (such as Claude 3.5 Sonnet) using SOTA alignment mechanisms. We improve the attack performance further by using an adaptive scheme that combines structure transformations along with existing content transformations, resulting in over 96% ASR with 0% refusals. To generalize our attacks, we explore numerous structure formats, including syntaxes purely generated by LLMs. Our results indicate that such novel syntaxes are easy to generate and result in a high ASR, suggesting that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

structtransform/benchmark
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling