Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

Xiaobing Sun; Perry Lam; Shaohua Li; Zizhou Wang; Rick Siow Mong Goh; Yong Liu; Liangli Zhen

arXiv:2603.16192·cs.CL·March 18, 2026

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

Xiaobing Sun, Perry Lam, Shaohua Li, Zizhou Wang, Rick Siow Mong Goh, Yong Liu, Liangli Zhen

PDF

Open Access

TL;DR

This paper introduces Structured Semantic Cloaking (S2C), a novel attack framework that manipulates semantic cues in prompts to evade safety mechanisms in large language models, significantly increasing jailbreak success rates.

Contribution

The paper presents S2C, a multi-dimensional semantic manipulation framework that delays and restructures malicious intent reconstruction during inference, improving attack success rates against LLM safety defenses.

Findings

01

S2C increases attack success rate by over 12% on HarmBench.

02

S2C outperforms state-of-the-art methods by up to 26% on JBB-Behaviors.

03

S2C effectively degrades safety trigger effectiveness while maintaining output recoverability.

Abstract

Modern LLMs employ safety mechanisms that extend beyond surface-level input filtering to latent semantic representations and generation-time reasoning, enabling them to recover obfuscated malicious intent during inference and refuse accordingly, and rendering many surface-level obfuscation jailbreak attacks ineffective. We propose Structured Semantic Cloaking (S2C), a novel multi-dimensional jailbreak attack framework that manipulates how malicious semantic intent is reconstructed during model inference. S2C strategically distributes and reshapes semantic cues such that full intent consolidation requires multi-step inference and long-range co-reference resolution within deeper latent representations. The framework comprises three complementary mechanisms: (1) Contextual Reframing, which embeds the request within a plausible high-stakes scenario to bias the model toward compliance; (2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Topic Modeling