Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning
Yuhua Zhu, Yuming Zhang, Haoyu Zhang

TL;DR
This paper introduces Optimal-PhiBE, a PDE-based framework for continuous-time reinforcement learning that reduces discretization errors and sensitivity issues, enabling more accurate and model-free policy learning from discrete data.
Contribution
Optimal-PhiBE integrates discrete-time information into a continuous PDE framework, improving accuracy and robustness over existing methods in CTRL.
Findings
Optimal-PhiBE recovers the optimal policy exactly in the undiscounted case.
It outperforms Optimal-BE in weakly discounted or control-dominant scenarios.
Numerical experiments verify the theoretical error bounds and effectiveness.
Abstract
This paper addresses continuous-time reinforcement learning (CTRL) where the system dynamics are governed by an unknown stochastic differential equation, and only discrete-time observations are available. Existing approaches face limitations: model-based PDE methods suffer from non-identifiability, while model-free methods based on the discrete-time optimal Bellman equation (Optimal-BE) suffer from large discretization errors that are highly sensitive to both the system dynamics and the reward structure. To overcome these challenges, we introduce Optimal-PhiBE, a formulation that integrates discrete-time information into a continuous-time PDE, combining the strength of both existing frameworks while mitigating their limitations. Optimal-PhiBE exhibits smaller discretization errors when the uncontrolled system evolves slowly, and demonstrates reduced sensitivity to oscillatory reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Extremum Seeking Control Systems
