Scaling Marginalized Importance Sampling to High-Dimensional   State-Spaces via State Abstraction

Brahma S. Pavse; Josiah P. Hanna

arXiv:2212.07486·cs.LG·December 16, 2022·1 cites

Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

Brahma S. Pavse, Josiah P. Hanna

PDF

Open Access 1 Video

TL;DR

This paper introduces a method to improve off-policy evaluation in high-dimensional reinforcement learning by using state abstraction to reduce variance and enhance accuracy of importance sampling estimators.

Contribution

It proposes a novel approach that leverages state abstraction to lower the variance of importance sampling ratios, improving OPE accuracy in high-dimensional spaces.

Findings

01

Lower variance in abstract ratios leads to reduced mean-squared error.

02

Abstract ratios improve robustness to hyperparameter tuning.

03

Method outperforms existing techniques on high-dimensional tasks.

Abstract

We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $π_{e}$ , using a fixed dataset, $D$ , collected by one or more policies that may be different from $π_{e}$ . Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under $π_{e}$ is very different from the probability of that same pair occurring in $D$ (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimators by projecting the high-dimensional state-space into a low-dimensional state-space using concepts from the state abstraction literature. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute state-action distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction· underline

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics