The Optimal Approximation Factors in Misspecified Off-Policy Value   Function Estimation

Philip Amortila; Nan Jiang; Csaba Szepesv\'ari

arXiv:2307.13332·cs.LG·December 18, 2023

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

Philip Amortila, Nan Jiang, Csaba Szepesv\'ari

PDF

Open Access 1 Video

TL;DR

This paper investigates the fundamental limits of approximation in off-policy value function estimation under model misspecification, providing optimal bounds across various norms and settings in reinforcement learning.

Contribution

It establishes the first optimal asymptotic approximation factors for linear off-policy value estimation in diverse settings, clarifying the inherent difficulty of the problem.

Findings

01

Identifies two instance-dependent factors for the $L_2(d)$ norm.

02

Establishes a single factor for the $L_a$ norm.

03

Provides tight bounds that characterize the hardness of off-policy evaluation.

Abstract

Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted $L_{2}$ -norm (where the weighting is the offline state distribution), the $L_{\infty}$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the $L_{2} (μ)$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms