PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation
Yuhua Zhu

TL;DR
This paper introduces PhiBE, a PDE-based Bellman equation for continuous-time policy evaluation in RL, offering more accurate value function approximation and improved sample complexity by leveraging the dynamics' smoothness.
Contribution
The paper develops PhiBE, a novel PDE-based Bellman equation for continuous-time RL, with theoretical guarantees and a model-free algorithm that improves sample complexity and handles model misspecification.
Findings
PhiBE provides a more accurate approximation than traditional discrete Bellman equations.
The model-free algorithm for PhiBE converges with finite-sample guarantees.
Sample complexity is improved to O(Δt^{-1}) by exploiting dynamics' smoothness.
Abstract
In this paper, we study policy evaluation in continuous-time reinforcement learning (RL), where the state follows an unknown stochastic differential equation (SDE), but only discrete-time data are available. We first highlight that the discrete-time Bellman equation (BE) is not always a reliable approximation to the true value function because it ignores the underlying continuous-time structure. We then introduce a new Bellman equation, PhiBE, which integrates the discrete-time information into a continuous-time PDE formulation. By leveraging the smooth structure of the underlying dynamics, PhiBE provides a provably more accurate approximation to the true value function, especially in scenarios where the underlying dynamics change slowly or the reward oscillates. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations. We further develop a model-free…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate Change Policy and Economics
