TL;DR
LPG introduces a dynamic safety guardrail for AI systems that balances reasoning and efficiency by compressing policy deliberation into latent states, enabling fast and accurate safety judgments.
Contribution
The paper presents LPG, a novel framework that learns semantic latent deliberation for dynamic safety policies, improving speed and accuracy without retraining.
Findings
LPG-4B achieves 84.5% safety accuracy and 77.9% F1 score.
LPG runs approximately 11 times faster than Qwen3-4B-Thinking.
LPG outperforms the strongest dynamic baseline on policy guardrail benchmarks.
Abstract
Guardrails are a critical safety layer for modern AI systems, but their operating regime is changing. As LLMs are deployed as customized assistants, safety policies are increasingly specified at inference time by users, organizations, or regulatory contexts. This makes safety enforcement fundamentally dynamic: the guardrail should adapt to changing safety policies without retraining. Yet this requirement creates a fundamental tension: faithfully judging complex policy contexts demands reasoning capability, while practical deployment requires low-latency responses. We introduce Latent Policy Guardrail (LPG), a guardrail framework that learnssemantic latent deliberation over dynamic policies. LPG compresses the internal deliberation needed for intent interpretation and policy grounding into continuous states supervised by decision-relevant semantics. At inference time, it generates only a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
