A Complete Characterization of Linear Estimators for Offline Policy Evaluation
Juan C. Perdomo, Akshay Krishnamurthy, Peter Bartlett, Sham Kakade

TL;DR
This paper provides a complete characterization of when linear estimators like FQI and LSTD succeed in offline policy evaluation, revealing their limitations and the fundamental conditions for their success in reinforcement learning.
Contribution
It introduces necessary and sufficient control-theoretic and linear-algebraic conditions for the success of classical linear estimators in offline policy evaluation, unifying and sharpening existing analyses.
Findings
LSTD succeeds under weaker conditions than FQI.
If LSTD fails, no linear estimator can succeed.
The paper establishes a hierarchy of regimes for estimator success.
Abstract
Offline policy evaluation is a fundamental statistical problem in reinforcement learning that involves estimating the value function of some decision-making policy given data collected by a potentially different policy. In order to tackle problems with complex, high-dimensional observations, there has been significant interest from theoreticians and practitioners alike in understanding the possibility of function approximation in reinforcement learning. Despite significant study, a sharp characterization of when we might expect offline policy evaluation to be tractable, even in the simplest setting of linear function approximation, has so far remained elusive, with a surprising number of strong negative results recently appearing in the literature. In this work, we identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications
