The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman, Carles Gelada, Marc G. Bellemare

TL;DR
This paper explores how pessimism in fixed-dataset policy optimization can ensure near-optimal policy selection even with limited data, supported by theoretical analysis and experiments in gridworld and MinAtar environments.
Contribution
It introduces a unified framework for analyzing fixed-dataset policy optimization and demonstrates the effectiveness of pessimistic algorithms in non-informative datasets.
Findings
Pessimistic algorithms perform well without fully informative datasets.
Naive approaches risk overestimating values leading to suboptimal policies.
Theoretical analysis supports the practical success of pessimism-based methods.
Abstract
We study worst-case guarantees on the expected return of fixed-dataset policy optimization algorithms. Our core contribution is a unified conceptual and mathematical framework for the study of algorithms in this regime. This analysis reveals that for naive approaches, the possibility of erroneous value overestimation leads to a difficult-to-satisfy requirement: in order to guarantee that we select a policy which is near-optimal, we may need the dataset to be informative of the value of every policy. To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world. We show why pessimistic algorithms can achieve good performance even when the dataset is not informative of every policy, and derive families of algorithms which follow this principle. These theoretical findings are validated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems
