Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection
J Alex Corll

TL;DR
This paper introduces a novel scoring formula for detecting multi-turn prompt injection attacks in language models, addressing the limitations of previous methods by considering attack persistence and diversity, and demonstrating high accuracy on a large dataset.
Contribution
The paper proposes the peak + accumulation scoring formula that effectively aggregates per-turn risk scores into a conversation-level risk, improving attack detection without requiring LLM invocation.
Findings
Achieves 90.8% recall at 1.20% FPR on multi-turn conversations
Identifies a phase transition in detection sensitivity at persistence parameter ~0.4
Provides open-source implementation of the scoring algorithm and evaluation tools
Abstract
Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is evaluated independently. While single-turn detection has been extensively studied, no published formula exists for aggregating per-turn pattern scores into a conversation-level risk score at the proxy layer -- without invoking an LLM. We identify a fundamental flaw in the intuitive weighted-average approach: it converges to the per-turn score regardless of turn count, meaning a 20-turn persistent attack scores identically to a single suspicious turn. Drawing on analogies from change-point detection (CUSUM), Bayesian belief updating, and security risk-based alerting, we propose peak + accumulation scoring -- a formula combining peak single-turn risk, persistence ratio, and category diversity. Evaluated on 10,654 multi-turn conversations -- 588…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Information and Cyber Security
