VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

Jina Kim; Youjin Jang; Jeongjin Han

arXiv:2506.16014·cs.LG·September 26, 2025

VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

Jina Kim, Youjin Jang, Jeongjin Han

PDF

Open Access

TL;DR

VRAIL introduces a bi-level, interpretable framework for value-based reinforcement learning that improves training stability and reveals meaningful subgoals through feature attribution.

Contribution

VRAIL presents a novel, model-agnostic approach combining value function estimation with reward shaping for enhanced interpretability in RL.

Findings

01

Improves training stability and convergence over standard DQN.

02

Uncovers human-interpretable subgoals like passenger possession.

03

Demonstrates effectiveness in Taxi-v3 environment.

Abstract

We propose VRAIL (Vectorized Reward-based Attribution for Interpretable Learning), a bi-level framework for value-based reinforcement learning (RL) that learns interpretable weight representations from state features. VRAIL consists of two stages: a deep learning (DL) stage that fits an estimated value function using state features, and an RL stage that uses this to shape learning via potential-based reward transformations. The estimator is modeled in either linear or quadratic form, allowing attribution of importance to individual features and their interactions. Empirical results on the Taxi-v3 environment demonstrate that VRAIL improves training stability and convergence compared to standard DQN, without requiring environment modifications. Further analysis shows that VRAIL uncovers semantically meaningful subgoals, such as passenger possession, highlighting its ability to produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics

MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network