Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation

Osama Zafar; Alexander Nemecek; Yiqian Zhang; Wenbiao Li; Debargha Ganguly; Vikash Singh; Vipin Chaudhary; Erman Ayday

arXiv:2605.17034·cs.LG·May 19, 2026

Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation

Osama Zafar, Alexander Nemecek, Yiqian Zhang, Wenbiao Li, Debargha Ganguly, Vikash Singh, Vipin Chaudhary, Erman Ayday

PDF

TL;DR

This paper presents a novel privacy enforcement framework for data-sensitive retrieval systems, effectively detecting contextual data leakage with high accuracy and low latency using dual density estimators and synthetic data.

Contribution

It introduces a new PPE framework with dual one-class density estimators and a calibrated abstain region, outperforming traditional methods in detecting borderline-safe data leaks.

Findings

01

Achieves a borderline AUROC of 0.93+ on stress tests.

02

Reduces false positives by 44-55 percentage points.

03

Maintains millisecond latency for real-time detection.

Abstract

Standard PII filters often miss contextual data leakage in RAG systems, such as non-regulated attribute clusters that collectively identify individuals. We introduce a Privacy Policy Enforcement (PPE) framework using dual one-class density estimators with fused text embeddings and a calibrated abstain region for out-of-distribution inputs. Using an axis-stratified, multi-LLM synthetic data pipeline across medicine, finance, and law, we found that traditional Gaussian Mixture baselines fail on borderline-safe stress tests by focusing on linguistic register rather than content. Our proposed T3+OCSVM detector, trained on safe and borderline-safe data, achieves a borderline AUROC of 0.93+ while reducing false positives by 44-55 percentage points and maintaining millisecond latency. Compared to supervised MLP classifiers or 14B-parameter LLM judges, our framework offers superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.