TL;DR
BodhiPromptShield is a framework that detects and mediates sensitive information flow in LLM/VLM agents to enhance privacy protection across multiple stages of processing.
Contribution
It introduces a propagation-aware mediation system with explicit timing controls, improving privacy beyond traditional document boundary de-identification methods.
Findings
Stage-wise propagation suppression reduced sensitive data flow from 10.7% to 7.1%.
Achieved 9.3% privacy breach rate with high accuracy and true positive rates.
Outperforms generic de-identification in controlled benchmark evaluations.
Abstract
In LLM/VLM agents, prompt privacy risk propagates beyond a single model call because raw user content can flow into retrieval queries, memory writes, tool calls, and logs. Existing de-identification pipelines address document boundaries but not this cross-stage propagation. We propose BodhiPromptShield, a policy-aware framework that detects sensitive spans, routes them via typed placeholders, semantic abstraction, or secure symbolic mapping, and delays restoration to authorized boundaries. Relative to enterprise redaction, this adds explicit propagation-aware mediation and restoration timing as a security variable. Under controlled evaluation on the Controlled Prompt-Privacy Benchmark (CPPB), stage-wise propagation suppresses from 10.7\% to 7.1\% across retrieval, memory, and tool stages; PER reaches 9.3\% with 0.94 AC and 0.92 TSR, outperforming generic de-identification. These are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
