Statistical guarantees for continuous-time policy evaluation: blessing of ellipticity and new tradeoffs
Wenlong Mou

TL;DR
This paper establishes non-asymptotic statistical guarantees for continuous-time policy evaluation using diffusion processes, highlighting the role of ellipticity and trade-offs between approximation and statistical errors.
Contribution
It provides the first finite-sample analysis of LSTD for continuous-time Markov diffusions, leveraging ellipticity to ensure robust performance and revealing new error trade-offs.
Findings
Achieves $O(1/\sqrt{T})$ convergence rate with trajectory length $T$
Shows ellipticity ensures robustness as horizon diverges
Balances approximation and statistical errors for optimal performance
Abstract
We study the estimation of the value function for continuous-time Markov diffusion processes using a single, discretely observed ergodic trajectory. Our work provides non-asymptotic statistical guarantees for the least-squares temporal-difference (LSTD) method, with performance measured in the first-order Sobolev norm. Specifically, the estimator attains an convergence rate when using a trajectory of length ; notably, this rate is achieved as long as scales nearly linearly with both the mixing time of the diffusion and the number of basis functions employed. A key insight of our approach is that the ellipticity inherent in the diffusion process ensures robust performance even as the effective horizon diverges to infinity. Moreover, we demonstrate that the Markovian component of the statistical error can be controlled by the approximation error, while the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAgricultural Economics and Policy · Climate Change Policy and Economics · Economic Policies and Impacts
MethodsDiffusion
