Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning

Yuhua Zhu; Yuming Zhang; Haoyu Zhang

arXiv:2506.05208·math.OC·October 14, 2025

Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning

Yuhua Zhu, Yuming Zhang, Haoyu Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Optimal-PhiBE, a PDE-based framework for continuous-time reinforcement learning that reduces discretization errors and sensitivity issues, enabling more accurate and model-free policy learning from discrete data.

Contribution

Optimal-PhiBE integrates discrete-time information into a continuous PDE framework, improving accuracy and robustness over existing methods in CTRL.

Findings

01

Optimal-PhiBE recovers the optimal policy exactly in the undiscounted case.

02

It outperforms Optimal-BE in weakly discounted or control-dominant scenarios.

03

Numerical experiments verify the theoretical error bounds and effectiveness.

Abstract

This paper addresses continuous-time reinforcement learning (CTRL) where the system dynamics are governed by an unknown stochastic differential equation, and only discrete-time observations are available. Existing approaches face limitations: model-based PDE methods suffer from non-identifiability, while model-free methods based on the discrete-time optimal Bellman equation (Optimal-BE) suffer from large discretization errors that are highly sensitive to both the system dynamics and the reward structure. To overcome these challenges, we introduce Optimal-PhiBE, a formulation that integrates discrete-time information into a continuous-time PDE, combining the strength of both existing frameworks while mitigating their limitations. Optimal-PhiBE exhibits smaller discretization errors when the uncontrolled system evolves slowly, and demonstrates reduced sensitivity to oscillatory reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haz053ucsd/Optimal_PhiBE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Extremum Seeking Control Systems