Co-Activation Graph Analysis of Safety-Verified and Explainable Deep   Reinforcement Learning Policies

Dennis Gross; Helge Spieker

arXiv:2501.03142·cs.AI·January 7, 2025

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

Dennis Gross, Helge Spieker

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach combining RL policy model checking with co-activation graph analysis to interpret and verify safe decision-making in deep reinforcement learning policies.

Contribution

It presents a new method that integrates model checking and co-activation graphs to analyze and interpret safe RL policies, enhancing understanding of neural network decision processes.

Findings

01

Effective interpretation of RL policies through co-activation graphs

02

Identification of unsafe behaviors in RL policies

03

Applicability demonstrated in various experimental settings

Abstract

Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors and are challenging to interpret. To address these challenges, we combine RL policy model checking--a technique for determining whether RL policies exhibit unsafe behaviors--with co-activation graph analysis--a method that maps neural network inner workings by analyzing neuron activation patterns--to gain insight into the safe RL policy's sequential decision-making. This combination lets us interpret the RL policy's inner workings for safe decision-making. We demonstrate its applicability in various experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lava-lab/cool-mc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSafety Systems Engineering in Autonomy · Occupational Health and Safety Research