BXRL: Behavior-Explainable Reinforcement Learning
Ram Rachum, Yotam Amitai, Yonatan Nakar, Reuth Mirsky, and Cameron Allen

TL;DR
BXRL introduces a formal framework for explaining behaviors in reinforcement learning by defining behaviors as measurable patterns of actions, enabling more precise and interpretable explanations of agent actions and policies.
Contribution
The paper formalizes behavior in reinforcement learning, introduces a behavior measure, and analyzes how existing explainability methods can be adapted to explain behaviors.
Findings
Defined behavior as a function from policies to real numbers.
Proposed contrastive behaviors for explanation.
Ported HighwayEnv to JAX for behavior analysis.
Abstract
A major challenge of Reinforcement Learning is that agents often learn undesired behaviors that seem to defy the reward structure they were given. Explainable Reinforcement Learning (XRL) methods can answer queries such as "explain this specific action", "explain this specific trajectory", and "explain the entire policy". However, XRL lacks a formal definition for behavior as a pattern of actions across many episodes. We provide such a definition, and use it to enable a new query: "Explain this behavior". We present Behavior-Explainable Reinforcement Learning (BXRL), a new problem formulation that treats behaviors as first-class objects. BXRL defines a behavior measure as any function , allowing users to precisely express the pattern of actions that they find interesting and measure how strongly the policy exhibits it. We define contrastive behaviors that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
