VRAIL: Vectorized Reward-based Attribution for Interpretable Learning
Jina Kim, Youjin Jang, Jeongjin Han

TL;DR
VRAIL introduces a bi-level, interpretable framework for value-based reinforcement learning that improves training stability and reveals meaningful subgoals through feature attribution.
Contribution
VRAIL presents a novel, model-agnostic approach combining value function estimation with reward shaping for enhanced interpretability in RL.
Findings
Improves training stability and convergence over standard DQN.
Uncovers human-interpretable subgoals like passenger possession.
Demonstrates effectiveness in Taxi-v3 environment.
Abstract
We propose VRAIL (Vectorized Reward-based Attribution for Interpretable Learning), a bi-level framework for value-based reinforcement learning (RL) that learns interpretable weight representations from state features. VRAIL consists of two stages: a deep learning (DL) stage that fits an estimated value function using state features, and an RL stage that uses this to shape learning via potential-based reward transformations. The estimator is modeled in either linear or quadratic form, allowing attribution of importance to individual features and their interactions. Empirical results on the Taxi-v3 environment demonstrate that VRAIL improves training stability and convergence compared to standard DQN, without requiring environment modifications. Further analysis shows that VRAIL uncovers semantically meaningful subgoals, such as passenger possession, highlighting its ability to produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network
