Interpret Policies in Deep Reinforcement Learning using SILVER with RL-Guided Labeling: A Model-level Approach to High-dimensional and Multi-action Environments
Yiyu Qian, Su Nguyen, Chao Chen, Qinyue Zhou, Liyuan Zhao

TL;DR
This paper introduces SILVER with RL-guided labeling, a method that enhances interpretability of deep RL policies in high-dimensional, multi-action environments by combining feature attribution, RL-guided boundary detection, and surrogate modeling.
Contribution
It extends the SILVER framework to handle complex environments by integrating RL policy outputs into the interpretability process, enabling scalable and behavior-aware explanations.
Findings
Maintains competitive task performance in Atari environments.
Significantly improves transparency and human understanding.
Effective in high-dimensional, multi-action settings.
Abstract
Deep reinforcement learning (RL) achieves remarkable performance but lacks interpretability, limiting trust in policy behavior. The existing SILVER framework (Li, Siddique, and Cao 2025) explains RL policy via Shapley-based regression but remains restricted to low-dimensional, binary-action domains. We propose SILVER with RL-guided labeling, an enhanced variant that extends SILVER to multi-action and high-dimensional environments by incorporating the RL policy's own action outputs into the boundary points identification. Our method first extracts compact feature representations from image observations, performs SHAP-based feature attribution, and then employs RL-guided labeling to generate behaviorally consistent boundary datasets. Surrogate models, such as decision trees and regression-based functions, are subsequently trained to interpret RL policy's decision structure. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
