'Explaining RL Decisions with Trajectories': A Reproducibility Study
Karim Abdel Sadek, Matteo Nulli, Joan Velja, Jort Vincenti

TL;DR
This reproducibility study evaluates the claims of a paper on explainable reinforcement learning based on trajectory attribution, confirming some claims with new metrics and experiments, while others remain unsupported due to limited original data.
Contribution
The paper reproduces and extends the original work, introducing quantitative metrics and testing different clustering methods to validate and expand on the initial claims.
Findings
Training on fewer trajectories lowers initial state value
Trajectories in a cluster share high-level patterns
Distant trajectories influence agent decisions
Abstract
This work investigates the reproducibility of the paper 'Explaining RL decisions with trajectories'. The original paper introduces a novel approach in explainable reinforcement learning based on the attribution decisions of an agent to specific clusters of trajectories encountered during training. We verify the main claims from the paper, which state that (i) training on less trajectories induces a lower initial state value, (ii) trajectories in a cluster present similar high-level patterns, (iii) distant trajectories influence the decision of an agent, and (iv) humans correctly identify the attributed trajectories to the decision of the agent. We recover the environments used by the authors based on the partial original code they provided for one of the environments (Grid-World), and implemented the remaining from scratch (Seaquest, HalfCheetah, Breakout and Q*Bert). While we confirm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRisk and Safety Analysis
