Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

J Alex Corll

arXiv:2602.11247·cs.CR·March 9, 2026

Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

J Alex Corll

PDF

Open Access

TL;DR

This paper introduces a novel scoring formula for detecting multi-turn prompt injection attacks in language models, addressing the limitations of previous methods by considering attack persistence and diversity, and demonstrating high accuracy on a large dataset.

Contribution

The paper proposes the peak + accumulation scoring formula that effectively aggregates per-turn risk scores into a conversation-level risk, improving attack detection without requiring LLM invocation.

Findings

01

Achieves 90.8% recall at 1.20% FPR on multi-turn conversations

02

Identifies a phase transition in detection sensitivity at persistence parameter ~0.4

03

Provides open-source implementation of the scoring algorithm and evaluation tools

Abstract

Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is evaluated independently. While single-turn detection has been extensively studied, no published formula exists for aggregating per-turn pattern scores into a conversation-level risk score at the proxy layer -- without invoking an LLM. We identify a fundamental flaw in the intuitive weighted-average approach: it converges to the per-turn score regardless of turn count, meaning a 20-turn persistent attack scores identically to a single suspicious turn. Drawing on analogies from change-point detection (CUSUM), Bayesian belief updating, and security risk-based alerting, we propose peak + accumulation scoring -- a formula combining peak single-turn risk, persistence ratio, and category diversity. Evaluated on 10,654 multi-turn conversations -- 588…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Information and Cyber Security