Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates
Florin Adrian Chitan

TL;DR
This paper presents Session Risk Memory (SRM), a lightweight module that enhances deterministic safety gates with trajectory-level authorization, improving detection of distributed attacks over agent sessions without additional training or inference complexity.
Contribution
SRM introduces a session-level risk assessment mechanism that extends existing safety gates with minimal overhead, enabling detection of complex multi-step malicious behaviors.
Findings
SRM achieves perfect F1 score with zero false positives on a multi-turn benchmark.
SRM operates with less than 250 microseconds overhead per turn.
SRM outperforms stateless gates in detecting distributed attacks.
Abstract
Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmful intent across multiple individually-compliant steps. This paper introduces Session Risk Memory (SRM), a lightweight deterministic module that extends stateless execution gates with trajectory-level authorization. SRM maintains a compact semantic centroid representing the evolving behavioral profile of an agent session and accumulates a risk signal through exponential moving average over baseline-subtracted gate outputs. It operates on the same semantic vector representation as the underlying gate, requiring no additional model components, training, or probabilistic inference. We evaluate SRM on a multi-turn benchmark of 80 sessions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Software System Performance and Reliability · Access Control and Trust
