Policy evaluation from a single path: Multi-step methods, mixing and   mis-specification

Yaqi Duan; Martin J. Wainwright

arXiv:2211.03899·stat.ML·November 9, 2022·1 cites

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

Yaqi Duan, Martin J. Wainwright

PDF

Open Access

TL;DR

This paper provides non-asymptotic guarantees for kernel-based multi-step temporal difference methods in estimating value functions from a single Markov process trajectory, highlighting effects of model mis-specification and mixing time.

Contribution

It introduces a unified analysis of multi-step TD methods with non-asymptotic bounds, including mis-specification effects and optimality results, for data from a single trajectory.

Findings

01

Bounds depend on Bellman fluctuations and mixing time.

02

Mis-specification inflates statistical error, mitigated by look-ahead.

03

Minimax lower bounds show optimality of proposed methods.

Abstract

We study non-parametric estimation of the value function of an infinite-horizon $γ$ -discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$ -step look-ahead TD for $K = 1, 2, \dots$ and the TD $(λ)$ family for $λ \in [0, 1)$ as special cases. Our bounds capture its dependence on Bellman fluctuations, mixing time of the Markov chain, any mis-specification in the model, as well as the choice of weight function defining the estimator itself, and reveal some delicate interactions between mixing time and model mis-specification. For a given TD method applied to a well-specified model, its statistical error under trajectory data is similar to that of i.i.d. sample transition pairs, whereas under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Markov Chains and Monte Carlo Methods · Statistical Methods and Bayesian Inference