The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

Yihao Zhang; Kai Wang; Jiangrong Wu; Haolin Wu; Yuxuan Zhou; Zeming Wei; Dongxian Wu; Xun Chen; Jun Sun; Meng Sun

arXiv:2604.11309·cs.CR·April 14, 2026

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

Yihao Zhang, Kai Wang, Jiangrong Wu, Haolin Wu, Yuxuan Zhou, Zeming Wei, Dongxian Wu, Xun Chen, Jun Sun, Meng Sun

PDF

TL;DR

This paper introduces Salami Slicing Risk, a novel multi-turn jailbreak attack on LLMs that cumulatively exploits low-risk inputs to trigger harmful outputs, demonstrating high success rates and proposing defenses.

Contribution

It presents a new attack framework that overcomes limitations of existing methods and offers strategies to mitigate multi-turn jailbreak vulnerabilities in LLMs.

Findings

01

Achieves over 90% success rate on GPT-4o and Gemini.

02

Robust against real-world alignment defenses.

03

Defense strategies can reduce attack success by at least 44.8%.

Abstract

Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security constraints and generate unethical or unsafe content. Among various jailbreak techniques, multi-turn jailbreak attacks are more covert and persistent than single-turn counterparts, exposing critical vulnerabilities of LLMs. However, existing multi-turn jailbreak methods suffer from two fundamental limitations that affect the actual impact in real-world scenarios: (a) As models become more context-aware, any explicit harmful trigger is increasingly likely to be flagged and blocked; (b) Successful final-step triggers often require finely tuned, model-specific contexts, making such attacks highly context-dependent. To fill this gap, we propose \textit{Salami Slicing Risk}, which operates by chaining numerous low-risk inputs that individually evade…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.