Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction
Brahma S. Pavse, Josiah P. Hanna

TL;DR
This paper introduces a method to improve off-policy evaluation in high-dimensional reinforcement learning by using state abstraction to reduce variance and enhance accuracy of importance sampling estimators.
Contribution
It proposes a novel approach that leverages state abstraction to lower the variance of importance sampling ratios, improving OPE accuracy in high-dimensional spaces.
Findings
Lower variance in abstract ratios leads to reduced mean-squared error.
Abstract ratios improve robustness to hyperparameter tuning.
Method outperforms existing techniques on high-dimensional tasks.
Abstract
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, , using a fixed dataset, , collected by one or more policies that may be different from . Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under is very different from the probability of that same pair occurring in (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimators by projecting the high-dimensional state-space into a low-dimensional state-space using concepts from the state abstraction literature. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute state-action distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAge of Information Optimization · Reinforcement Learning in Robotics
