Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies
Dennis Gross, Helge Spieker

TL;DR
This paper introduces a novel approach combining RL policy model checking with co-activation graph analysis to interpret and verify safe decision-making in deep reinforcement learning policies.
Contribution
It presents a new method that integrates model checking and co-activation graphs to analyze and interpret safe RL policies, enhancing understanding of neural network decision processes.
Findings
Effective interpretation of RL policies through co-activation graphs
Identification of unsafe behaviors in RL policies
Applicability demonstrated in various experimental settings
Abstract
Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors and are challenging to interpret. To address these challenges, we combine RL policy model checking--a technique for determining whether RL policies exhibit unsafe behaviors--with co-activation graph analysis--a method that maps neural network inner workings by analyzing neuron activation patterns--to gain insight into the safe RL policy's sequential decision-making. This combination lets us interpret the RL policy's inner workings for safe decision-making. We demonstrate its applicability in various experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Occupational Health and Safety Research
