Interpreting Reinforcement Learning Model Behavior via Koopman with Control

William T. Redman

arXiv:2603.19968·math.OC·March 23, 2026

Interpreting Reinforcement Learning Model Behavior via Koopman with Control

William T. Redman

PDF

Open Access

TL;DR

This paper introduces a method using Koopman operators to interpret RL models as control systems, enabling analysis of their behavior and training progress through dynamical properties like stability and controllability.

Contribution

The paper applies Koopman with control to RL models, demonstrating its effectiveness in analyzing training dynamics and revealing hidden progress indicators.

Findings

01

Properties like stability and controllability evolve during training.

02

Metrics can predict increased reward even when performance is static.

03

The framework offers a new way to interpret RL model behavior.

Abstract

Reinforcement learning (RL) models have shown the capability of learning complex behaviors, but quantitatively assessing those behaviors - which is critical for safety assurance and the discovery of novel strategies - is challenging. By viewing RL models as control systems, we hypothesize that data-driven approximations of their associated Koopman operators may provide dynamical information about their behavior, thus enabling greater interpretability. To test this, we apply the Koopman with control framework to RL models trained on several standard benchmark environments and demonstrate that properties of the fit linear control models, such as stability and controllability, evolve during training in a task dependent manner. Comparing these metrics across different training epochs or across differently optimized RL models enables an understanding of how they differ. In addition, we find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics