Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
Naheed Rayhan, Sohely Jahan

TL;DR
This paper presents Transient Turn Injection, a novel multi-turn attack method exploiting stateless moderation in large language models, revealing vulnerabilities and proposing mitigation strategies.
Contribution
Introduces TTI, a new attack technique that exposes stateless vulnerabilities in LLMs, with comprehensive evaluation and mitigation insights.
Findings
TTI can evade policy enforcement in multiple LLMs
Significant variation in model robustness to TTI attacks
Identifies new vulnerabilities in medical and high-stakes domains
Abstract
Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across isolated interactions. TTI leverages automated attacker agents powered by large language models to iteratively test and evade policy enforcement in both commercial and open-source LLMs, marking a departure from conventional jailbreak approaches that typically depend on maintaining persistent conversational context. Our extensive evaluation across state-of-the-art models-including those from OpenAI, Anthropic, Google Gemini, Meta, and prominent open-source alternatives-uncovers significant variations in resilience to TTI attacks, with only select architectures exhibiting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
