Operator Models for Continuous-Time Offline Reinforcement Learning

Nicolas Hoischen; Petar Bevanda; Max Beier; Stefan Sosnowski; Boris Houska; Sandra Hirche

arXiv:2511.10383·stat.ML·November 14, 2025

Operator Models for Continuous-Time Offline Reinforcement Learning

Nicolas Hoischen, Petar Bevanda, Max Beier, Stefan Sosnowski, Boris Houska, Sandra Hirche

PDF

Open Access

TL;DR

This paper introduces an operator-theoretic approach to offline reinforcement learning in continuous-time systems, linking it to the Hamilton-Jacobi-Bellman equation and providing convergence guarantees.

Contribution

It develops a novel algorithm based on operator theory and reproducing kernel Hilbert spaces, offering theoretical convergence and finite-sample bounds for continuous-time offline RL.

Findings

01

Proposes a new operator-based algorithm for continuous-time offline RL.

02

Establishes global convergence and finite-sample guarantees.

03

Demonstrates promising numerical results.

Abstract

Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data. However, there is limited statistical understanding of the approximation errors inherent in learning policies from offline datasets. We address this by linking reinforcement learning to the Hamilton-Jacobi-Bellman equation and proposing an operator-theoretic algorithm based on a simple dynamic programming recursion. Specifically, we represent our world model in terms of the infinitesimal generator of controlled diffusion processes learned in a reproducing kernel Hilbert space. By integrating statistical learning methods and operator theory, we establish global convergence of the value function and derive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control