Interpret Policies in Deep Reinforcement Learning using SILVER with RL-Guided Labeling: A Model-level Approach to High-dimensional and Multi-action Environments

Yiyu Qian; Su Nguyen; Chao Chen; Qinyue Zhou; Liyuan Zhao

arXiv:2510.19244·cs.LG·October 27, 2025

Interpret Policies in Deep Reinforcement Learning using SILVER with RL-Guided Labeling: A Model-level Approach to High-dimensional and Multi-action Environments

Yiyu Qian, Su Nguyen, Chao Chen, Qinyue Zhou, Liyuan Zhao

PDF

TL;DR

This paper introduces SILVER with RL-guided labeling, a method that enhances interpretability of deep RL policies in high-dimensional, multi-action environments by combining feature attribution, RL-guided boundary detection, and surrogate modeling.

Contribution

It extends the SILVER framework to handle complex environments by integrating RL policy outputs into the interpretability process, enabling scalable and behavior-aware explanations.

Findings

01

Maintains competitive task performance in Atari environments.

02

Significantly improves transparency and human understanding.

03

Effective in high-dimensional, multi-action settings.

Abstract

Deep reinforcement learning (RL) achieves remarkable performance but lacks interpretability, limiting trust in policy behavior. The existing SILVER framework (Li, Siddique, and Cao 2025) explains RL policy via Shapley-based regression but remains restricted to low-dimensional, binary-action domains. We propose SILVER with RL-guided labeling, an enhanced variant that extends SILVER to multi-action and high-dimensional environments by incorporating the RL policy's own action outputs into the boundary points identification. Our method first extracts compact feature representations from image observations, performs SHAP-based feature attribution, and then employs RL-guided labeling to generate behaviorally consistent boundary datasets. Surrogate models, such as decision trees and regression-based functions, are subsequently trained to interpret RL policy's decision structure. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.