Temporal-Spatial Causal Interpretations for Vision-Based Reinforcement Learning
Wenjie Shi, Gao Huang, Shiji Song, Cheng Wu

TL;DR
This paper introduces TSCI, a novel method for interpreting vision-based reinforcement learning agents by uncovering temporal-spatial causal features, thereby enhancing understanding of their long-term decision-making processes.
Contribution
The paper proposes a temporal-spatial causal interpretation model that captures long-term causal relations in vision-based RL, addressing limitations of existing methods.
Findings
Produces high-resolution, task-relevant attention masks
Effectively uncovers temporal causal features in RL agents
Applicable to recurrent RL agents for causal discovery
Abstract
Deep reinforcement learning (RL) agents are becoming increasingly proficient in a range of complex control tasks. However, the agent's behavior is usually difficult to interpret due to the introduction of black-box function, making it difficult to acquire the trust of users. Although there have been some interesting interpretation methods for vision-based RL, most of them cannot uncover temporal causal information, raising questions about their reliability. To address this problem, we present a temporal-spatial causal interpretation (TSCI) model to understand the agent's long-term behavior, which is essential for sequential decision-making. TSCI model builds on the formulation of temporal causality, which reflects the temporal causal relations between sequential observations and decisions of RL agent. Then a separate causal discovery network is employed to identify temporal-spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
