The Laminar Flow Hypothesis: Detecting Jailbreaks via Semantic Turbulence in Large Language Models
Md. Hasib Ur Rahman

TL;DR
This paper introduces the Laminar Flow Hypothesis, proposing that benign inputs cause smooth model dynamics while adversarial prompts induce chaotic 'Semantic Turbulence', enabling real-time jailbreak detection through a novel variance metric.
Contribution
The work formalizes Semantic Turbulence as a diagnostic metric for detecting adversarial prompts and characterizes different safety architectures in language models.
Findings
Qwen2-1.5B shows 75.4% increase in turbulence under attack
Gemma-2B exhibits 22.0% decrease in turbulence during refusal
Semantic Turbulence effectively detects jailbreaks in diverse models
Abstract
As Large Language Models (LLMs) become ubiquitous, the challenge of securing them against adversarial "jailbreaking" attacks has intensified. Current defense strategies often rely on computationally expensive external classifiers or brittle lexical filters, overlooking the intrinsic dynamics of the model's reasoning process. In this work, the Laminar Flow Hypothesis is introduced, which posits that benign inputs induce smooth, gradual transitions in an LLM's high-dimensional latent space, whereas adversarial prompts trigger chaotic, high-variance trajectories - termed Semantic Turbulence - resulting from the internal conflict between safety alignment and instruction-following objectives. This phenomenon is formalized through a novel, zero-shot metric: the variance of layer-wise cosine velocity. Experimental evaluation across diverse small language models reveals a striking diagnostic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
