Interpreting Reinforcement Learning Model Behavior via Koopman with Control
William T. Redman

TL;DR
This paper introduces a method using Koopman operators to interpret RL models as control systems, enabling analysis of their behavior and training progress through dynamical properties like stability and controllability.
Contribution
The paper applies Koopman with control to RL models, demonstrating its effectiveness in analyzing training dynamics and revealing hidden progress indicators.
Findings
Properties like stability and controllability evolve during training.
Metrics can predict increased reward even when performance is static.
The framework offers a new way to interpret RL model behavior.
Abstract
Reinforcement learning (RL) models have shown the capability of learning complex behaviors, but quantitatively assessing those behaviors - which is critical for safety assurance and the discovery of novel strategies - is challenging. By viewing RL models as control systems, we hypothesize that data-driven approximations of their associated Koopman operators may provide dynamical information about their behavior, thus enabling greater interpretability. To test this, we apply the Koopman with control framework to RL models trained on several standard benchmark environments and demonstrate that properties of the fit linear control models, such as stability and controllability, evolve during training in a task dependent manner. Comparing these metrics across different training epochs or across differently optimized RL models enables an understanding of how they differ. In addition, we find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
