Policy Regularization for Legible Behavior
Michele Persiani, Thomas Hellstr\"om

TL;DR
This paper introduces a method to enhance the interpretability of reinforcement learning agents in online settings by regularizing policies to be more legible, making their intentions clearer without altering the core learning process.
Contribution
It proposes a novel policy regularization technique that improves agent legibility by influencing decision boundaries, without modifying the underlying learning algorithm.
Findings
Regularization improves policy interpretability in online RL.
Legible policies make agent intentions more transparent.
Trade-offs exist between optimality and legibility in policies.
Abstract
In Reinforcement Learning interpretability generally means to provide insight into the agent's mechanisms such that its decisions are understandable by an expert upon inspection. This definition, with the resulting methods from the literature, may however fall short for online settings where the fluency of interactions prohibits deep inspections of the decision-making algorithm. To support interpretability in online settings it is useful to borrow from the Explainable Planning literature methods that focus on the legibility of the agent, by making its intention easily discernable in an observer model. As we propose in this paper, injecting legible behavior inside an agent's policy doesn't require modify components of its learning algorithm. Rather, the agent's optimal policy can be regularized for legibility by evaluating how the policy may produce observations that would make an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
