The Importance of Pessimism in Fixed-Dataset Policy Optimization

Jacob Buckman; Carles Gelada; Marc G. Bellemare

arXiv:2009.06799·cs.AI·December 1, 2020·23 cites

The Importance of Pessimism in Fixed-Dataset Policy Optimization

Jacob Buckman, Carles Gelada, Marc G. Bellemare

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores how pessimism in fixed-dataset policy optimization can ensure near-optimal policy selection even with limited data, supported by theoretical analysis and experiments in gridworld and MinAtar environments.

Contribution

It introduces a unified framework for analyzing fixed-dataset policy optimization and demonstrates the effectiveness of pessimistic algorithms in non-informative datasets.

Findings

01

Pessimistic algorithms perform well without fully informative datasets.

02

Naive approaches risk overestimating values leading to suboptimal policies.

03

Theoretical analysis supports the practical success of pessimism-based methods.

Abstract

We study worst-case guarantees on the expected return of fixed-dataset policy optimization algorithms. Our core contribution is a unified conceptual and mathematical framework for the study of algorithms in this regime. This analysis reveals that for naive approaches, the possibility of erroneous value overestimation leads to a difficult-to-satisfy requirement: in order to guarantee that we select a policy which is near-optimal, we may need the dataset to be informative of the value of every policy. To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world. We show why pessimistic algorithms can achieve good performance even when the dataset is not informative of every policy, and derive families of algorithms which follow this principle. These theoretical findings are validated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jbuckman/tiopifdpo
pytorchOfficial

Videos

The Importance of Pessimism in Fixed-Dataset Policy Optimization· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems