xSRL: Safety-Aware Explainable Reinforcement Learning -- Safety as a Product of Explainability
Risal Shahriar Shefin, Md Asifur Rahman, Thai Le, Sarra Alqahtani

TL;DR
xSRL is a framework that enhances the safety and explainability of reinforcement learning agents by providing comprehensive local and global explanations, enabling debugging, vulnerability detection, and increased trust for real-world applications.
Contribution
The paper introduces xSRL, a novel explainability framework for RL that combines local and global explanations and supports vulnerability analysis without retraining.
Findings
xSRL improves safety and trustworthiness of RL agents.
It enables identification of policy vulnerabilities through adversarial attacks.
User studies confirm increased understanding and confidence in RL decisions.
Abstract
Reinforcement learning (RL) has shown great promise in simulated environments, such as games, where failures have minimal consequences. However, the deployment of RL agents in real-world systems such as autonomous vehicles, robotics, UAVs, and medical devices demands a higher level of safety and transparency, particularly when facing adversarial threats. Safe RL algorithms have been developed to address these concerns by optimizing both task performance and safety constraints. However, errors are inevitable, and when they occur, it is essential that the RL agents can also explain their actions to human operators. This makes trust in the safety mechanisms of RL systems crucial for effective deployment. Explainability plays a key role in building this trust by providing clear, actionable insights into the agent's decision-making process, ensuring that safety-critical decisions are well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
