Offline Policy Optimization with Eligible Actions

Yao Liu; Yannis Flet-Berliac; Emma Brunskill

arXiv:2207.00632·cs.LG·July 5, 2022

Offline Policy Optimization with Eligible Actions

Yao Liu, Yannis Flet-Berliac, Emma Brunskill

PDF

Open Access 1 Repo

TL;DR

This paper addresses overfitting in offline policy optimization by introducing a normalization constraint, demonstrating improved performance and reduced overfitting in healthcare and control tasks.

Contribution

It proposes a novel per-state-neighborhood normalization algorithm to mitigate overfitting in importance-weighted offline policy optimization, with theoretical and empirical validation.

Findings

01

Reduced overfitting in policy learning

02

Improved test performance over existing methods

03

Effective in healthcare and control environments

Abstract

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decision process model function classes. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, in which it may be possible for the learned policy to essentially avoid making aligned decisions for part of the initial state space. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint, and provide a theoretical justification of the proposed algorithm. We also show the limitations of previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stanfordai4hi/poela
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification

MethodsTest