State-Action Similarity-Based Representations for Off-Policy Evaluation

Brahma S. Pavse; Josiah P. Hanna

arXiv:2310.18409·cs.LG·October 31, 2023·2 cites

State-Action Similarity-Based Representations for Off-Policy Evaluation

Brahma S. Pavse, Josiah P. Hanna

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel state-action similarity-based representation learning method to improve the data-efficiency and accuracy of off-policy evaluation in reinforcement learning, particularly enhancing the FQE algorithm.

Contribution

The paper proposes an OPE-specific state-action similarity metric and a learned encoder that improves FQE's data-efficiency and robustness against distribution shifts.

Findings

01

The proposed method outperforms other representation learning approaches in OPE tasks.

02

It reduces divergence of FQE under distribution shifts.

03

The learned representations improve OPE accuracy and data-efficiency.

Abstract

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful algorithms for OPE has been the fitted q-evaluation (FQE) algorithm that uses temporal difference updates to learn an action-value function, which is then used to estimate the expected return of the evaluation policy. Typically, the original fixed dataset is fed directly into FQE to learn the action-value function of the evaluation policy. Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE. To learn such an encoder, we introduce an OPE-tailored state-action behavioral similarity metric, and use this metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

badger-rl/rope
pytorchOfficial

Videos

State-Action Similarity-Based Representations for Off-Policy Evaluation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics