A Complete Characterization of Linear Estimators for Offline Policy   Evaluation

Juan C. Perdomo; Akshay Krishnamurthy; Peter Bartlett; Sham Kakade

arXiv:2203.04236·cs.LG·December 20, 2022

A Complete Characterization of Linear Estimators for Offline Policy Evaluation

Juan C. Perdomo, Akshay Krishnamurthy, Peter Bartlett, Sham Kakade

PDF

Open Access

TL;DR

This paper provides a complete characterization of when linear estimators like FQI and LSTD succeed in offline policy evaluation, revealing their limitations and the fundamental conditions for their success in reinforcement learning.

Contribution

It introduces necessary and sufficient control-theoretic and linear-algebraic conditions for the success of classical linear estimators in offline policy evaluation, unifying and sharpening existing analyses.

Findings

01

LSTD succeeds under weaker conditions than FQI.

02

If LSTD fails, no linear estimator can succeed.

03

The paper establishes a hierarchy of regimes for estimator success.

Abstract

Offline policy evaluation is a fundamental statistical problem in reinforcement learning that involves estimating the value function of some decision-making policy given data collected by a potentially different policy. In order to tackle problems with complex, high-dimensional observations, there has been significant interest from theoreticians and practitioners alike in understanding the possibility of function approximation in reinforcement learning. Despite significant study, a sharp characterization of when we might expect offline policy evaluation to be tractable, even in the simplest setting of linear function approximation, has so far remained elusive, with a surprising number of strong negative results recently appearing in the literature. In this work, we identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications