xSRL: Safety-Aware Explainable Reinforcement Learning -- Safety as a   Product of Explainability

Risal Shahriar Shefin; Md Asifur Rahman; Thai Le; Sarra Alqahtani

arXiv:2412.19311·cs.AI·December 30, 2024

xSRL: Safety-Aware Explainable Reinforcement Learning -- Safety as a Product of Explainability

Risal Shahriar Shefin, Md Asifur Rahman, Thai Le, Sarra Alqahtani

PDF

Open Access 1 Repo

TL;DR

xSRL is a framework that enhances the safety and explainability of reinforcement learning agents by providing comprehensive local and global explanations, enabling debugging, vulnerability detection, and increased trust for real-world applications.

Contribution

The paper introduces xSRL, a novel explainability framework for RL that combines local and global explanations and supports vulnerability analysis without retraining.

Findings

01

xSRL improves safety and trustworthiness of RL agents.

02

It enables identification of policy vulnerabilities through adversarial attacks.

03

User studies confirm increased understanding and confidence in RL decisions.

Abstract

Reinforcement learning (RL) has shown great promise in simulated environments, such as games, where failures have minimal consequences. However, the deployment of RL agents in real-world systems such as autonomous vehicles, robotics, UAVs, and medical devices demands a higher level of safety and transparency, particularly when facing adversarial threats. Safe RL algorithms have been developed to address these concerns by optimizing both task performance and safety constraints. However, errors are inevitable, and when they occur, it is essential that the RL agents can also explain their actions to human operators. This makes trust in the safety mechanisms of RL systems crucial for effective deployment. Explainability plays a key role in building this trust by providing clear, actionable insights into the agent's decision-making process, ensuring that safety-critical decisions are well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

risal-shefin/xsrl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning