The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
Philip Amortila, Nan Jiang, Csaba Szepesv\'ari

TL;DR
This paper investigates the fundamental limits of approximation in off-policy value function estimation under model misspecification, providing optimal bounds across various norms and settings in reinforcement learning.
Contribution
It establishes the first optimal asymptotic approximation factors for linear off-policy value estimation in diverse settings, clarifying the inherent difficulty of the problem.
Findings
Identifies two instance-dependent factors for the $L_2(d)$ norm.
Establishes a single factor for the $L_a$ norm.
Provides tight bounds that characterize the hardness of off-policy evaluation.
Abstract
Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted -norm (where the weighting is the offline state distribution), the norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
