LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

Minbeom Kim; Lesly Miculicich; Bhavana Dalvi Mishra; Mihir Parmar; Phillip Wallis; Bharath Chandrasekhar; Kyomin Jung; Tomas Pfister; Long T. Le

arXiv:2605.14454·cs.LG·May 15, 2026

LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

Minbeom Kim, Lesly Miculicich, Bhavana Dalvi Mishra, Mihir Parmar, Phillip Wallis, Bharath Chandrasekhar, Kyomin Jung, Tomas Pfister, Long T. Le

PDF

1 Datasets

TL;DR

LiSA is a framework that enhances AI safety guardrails by converting sparse failure feedback into reusable policies, improving robustness and adaptability in real-world deployment environments.

Contribution

LiSA introduces a conservative policy induction method with structured memory and confidence gating to adapt guardrails using limited, noisy feedback.

Findings

01

LiSA outperforms memory-based baselines under sparse feedback.

02

LiSA remains robust with up to 20% label-flip noise.

03

LiSA improves latency-performance trade-offs beyond baseline models.

Abstract

As AI agents move from chat interfaces to systems that read private data, call tools, and execute multi-step workflows, guardrails become a last line of defense against concrete deployment harms. In these settings, guardrail failures are no longer merely answer-quality errors: they can leak secrets, authorize unsafe actions, or block legitimate work. The hardest failures are often contextual: whether an action is acceptable depends on local privacy norms, organizational policies, and user expectations that resist pre-deployment specification. This creates a practical gap: guardrails must adapt to their own operating environments, yet deployment feedback is typically limited to sparse, noisy user-reported failures, and repeated fine-tuning is often impractical. To address this gap, we propose LiSA (Lifelong Safety Adaptation), a conservative policy induction framework that improves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

aoiandroid/papers
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.