Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models
Xiaobing Sun, Perry Lam, Shaohua Li, Zizhou Wang, Rick Siow Mong Goh, Yong Liu, Liangli Zhen

TL;DR
This paper introduces Structured Semantic Cloaking (S2C), a novel attack framework that manipulates semantic cues in prompts to evade safety mechanisms in large language models, significantly increasing jailbreak success rates.
Contribution
The paper presents S2C, a multi-dimensional semantic manipulation framework that delays and restructures malicious intent reconstruction during inference, improving attack success rates against LLM safety defenses.
Findings
S2C increases attack success rate by over 12% on HarmBench.
S2C outperforms state-of-the-art methods by up to 26% on JBB-Behaviors.
S2C effectively degrades safety trigger effectiveness while maintaining output recoverability.
Abstract
Modern LLMs employ safety mechanisms that extend beyond surface-level input filtering to latent semantic representations and generation-time reasoning, enabling them to recover obfuscated malicious intent during inference and refuse accordingly, and rendering many surface-level obfuscation jailbreak attacks ineffective. We propose Structured Semantic Cloaking (S2C), a novel multi-dimensional jailbreak attack framework that manipulates how malicious semantic intent is reconstructed during model inference. S2C strategically distributes and reshapes semantic cues such that full intent consolidation requires multi-step inference and long-range co-reference resolution within deeper latent representations. The framework comprises three complementary mechanisms: (1) Contextual Reframing, which embeds the request within a plausible high-stakes scenario to bias the model toward compliance; (2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Topic Modeling
