Percentile Criterion Optimization in Offline Reinforcement Learning

Elita A. Lobo; Cyrus Cousins; Yair Zick; Marek Petrik

arXiv:2404.05055·cs.LG·April 9, 2024·2 cites

Percentile Criterion Optimization in Offline Reinforcement Learning

Elita A. Lobo, Cyrus Cousins, Yair Zick, Marek Petrik

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new dynamic programming algorithm for offline reinforcement learning that optimizes the percentile criterion more efficiently by implicitly constructing smaller ambiguity sets, leading to less conservative policies.

Contribution

A novel Value-at-Risk based dynamic programming method that avoids explicit ambiguity set construction in percentile criterion optimization.

Findings

01

Implicitly constructs smaller ambiguity sets

02

Learns less conservative robust policies

03

Outperforms existing Bayesian credible region methods

Abstract

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the \emph{percentile criterion}. The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set. Since the percentile criterion is non-convex, constructing ambiguity sets is often challenging. Existing work uses \emph{Bayesian credible regions} as ambiguity sets, but they are often unnecessarily large and result in learning overly conservative policies. To overcome these shortcomings, we propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any ambiguity sets. Our theoretical and empirical results show that our algorithm implicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elitalobo/varframework
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management