PolicyBank: Evolving Policy Understanding for LLM Agents
Jihye Choi, Jinsung Yoon, Long T. Le, Somesh Jha, Tomas Pfister

TL;DR
PolicyBank enables LLM agents to autonomously refine their understanding of organizational policies through interaction and feedback, significantly improving compliance in the presence of ambiguous or incomplete specifications.
Contribution
The paper introduces PolicyBank, a memory mechanism that iteratively refines policy understanding, and provides a new testbed for evaluating policy compliance in LLM agents.
Findings
PolicyBank closes up to 82% of policy gaps.
Existing memory mechanisms achieve near-zero success on policy-gap scenarios.
The systematic testbed isolates alignment failures from execution failures.
Abstract
LLM agents operating under organizational policies must comply with authorization constraints typically specified in natural language. In practice, such specifications inevitably contain ambiguities and logical or semantic gaps that cause the agent's behavior to systematically diverge from the true requirements. We ask: by letting an agent evolve its policy understanding through interaction and corrective feedback from pre-deployment testing, can it autonomously refine its interpretation to close specification gaps? We propose PolicyBank, a memory mechanism that maintains structured, tool-level policy insights and iteratively refines them -- unlike existing memory mechanisms that treat the policy as immutable ground truth, reinforcing "compliant but wrong" behaviors. We also contribute a systematic testbed by extending a popular tool-calling benchmark with controlled policy gaps that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
