Safety-Oriented Pruning and Interpretation of Reinforcement Learning   Policies

Dennis Gross; Helge Spieker

arXiv:2409.10218·cs.LG·September 17, 2024

Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies

Dennis Gross, Helge Spieker

PDF

Open Access

TL;DR

This paper presents VERINTER, a method that combines neural network pruning with model checking to ensure safety and interpretability in reinforcement learning policies, maintaining safety while simplifying models.

Contribution

Introducing VERINTER, a novel approach that precisely quantifies pruning effects and safety impacts in RL policies, enhancing safety guarantees and interpretability.

Findings

01

VERINTER maintains safety in pruned RL policies.

02

The method improves understanding of safety dynamics.

03

Effective across multiple RL settings.

Abstract

Pruning neural networks (NNs) can streamline them but risks removing vital parameters from safe reinforcement learning (RL) policies. We introduce an interpretable RL method called VERINTER, which combines NN pruning with model checking to ensure interpretable RL safety. VERINTER exactly quantifies the effects of pruning and the impact of neural connections on complex safety properties by analyzing changes in safety measurements. This method maintains safety in pruned RL policies and enhances understanding of their safety dynamics, which has proven effective in multiple RL settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBehavioral and Psychological Studies

MethodsPruning