Operator Models for Continuous-Time Offline Reinforcement Learning
Nicolas Hoischen, Petar Bevanda, Max Beier, Stefan Sosnowski, Boris Houska, Sandra Hirche

TL;DR
This paper introduces an operator-theoretic approach to offline reinforcement learning in continuous-time systems, linking it to the Hamilton-Jacobi-Bellman equation and providing convergence guarantees.
Contribution
It develops a novel algorithm based on operator theory and reproducing kernel Hilbert spaces, offering theoretical convergence and finite-sample bounds for continuous-time offline RL.
Findings
Proposes a new operator-based algorithm for continuous-time offline RL.
Establishes global convergence and finite-sample guarantees.
Demonstrates promising numerical results.
Abstract
Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data. However, there is limited statistical understanding of the approximation errors inherent in learning policies from offline datasets. We address this by linking reinforcement learning to the Hamilton-Jacobi-Bellman equation and proposing an operator-theoretic algorithm based on a simple dynamic programming recursion. Specifically, we represent our world model in terms of the infinitesimal generator of controlled diffusion processes learned in a reproducing kernel Hilbert space. By integrating statistical learning methods and operator theory, we establish global convergence of the value function and derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
