Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular   reinforcement learning

Ashwin Pananjady; Martin J. Wainwright

arXiv:1909.08749·stat.ML·September 17, 2020·6 cites

Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning

Ashwin Pananjady, Martin J. Wainwright

PDF

Open Access

TL;DR

This paper derives instance-dependent and data-dependent finite-sample bounds for policy evaluation in tabular reinforcement learning, providing minimax-optimal guarantees in the l_ norm.

Contribution

It introduces novel non-asymptotic bounds for l_ policy evaluation, including a robust variant, with analysis tailored to the specific MRP instance.

Findings

01

Bounds are minimax-optimal up to constants.

02

Data-dependent bounds can be computed from observations.

03

The leave-one-out decoupling technique is a key analytical tool.

Abstract

Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon, discounted MRP on finitely many states in the $ℓ_{\infty}$ -norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Software Reliability and Analysis Research