PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines
Riya Tapwal, Abhishek Kumar, Carsten Maple

TL;DR
PRISM is a real-time detection and mitigation system for preventing secret leakage in multi-agent LLM pipelines by analyzing generation dynamics and structural cues during decoding.
Contribution
It introduces a novel sequential risk scoring method that detects potential leakage early, outperforming existing static and post-generation defenses.
Findings
PRISM achieves an F1 score of 0.832 with perfect precision on a comprehensive benchmark.
It prevents 0% task-level leakage while maintaining high output utility.
Outperforms the baseline Span Tagger significantly in detection performance.
Abstract
Multi-agent LLM systems introduce a security risk in which sensitive information accessed by one agent can propagate through shared context and reappear in downstream outputs, even without explicit adversarial intent. We formalise this phenomenon as propagation amplification, where leakage risk increases across agent boundaries as sensitive content is repeatedly exposed to downstream generators. Existing defences, including prompt-based safeguards, static pattern matching, and LLM-as-judge filtering, are not designed for this setting: they either operate after generation, rely primarily on surface-form patterns, or add substantial latency without modelling the generation process itself. To resolve these issues, we propose PRISM, a real-time defence that treats credential leakage as a sequential risk accumulation problem during generation. At each decoding step, PRISM combines 16 signals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
