Minimax-Bayes Reinforcement Learning

Thomas Kleine Buening; Christos Dimitrakakis; Hannes Eriksson; Divya; Grover; Emilio Jorge

arXiv:2302.10831·cs.LG·February 22, 2023

Minimax-Bayes Reinforcement Learning

Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya, Grover, Emilio Jorge

PDF

Open Access 1 Repo

TL;DR

This paper explores minimax-Bayes solutions in reinforcement learning, showing that worst-case priors lead to more robust policies compared to standard priors, providing insights into decision-making under uncertainty.

Contribution

It introduces and analyzes minimax-Bayes approaches for reinforcement learning, highlighting their robustness and properties of the resulting priors and policies.

Findings

01

Minimax-Bayes policies are more robust than standard prior policies.

02

Worst-case priors vary depending on the setting.

03

Insights into the properties of priors and policies in RL.

Abstract

While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minimaxbrl/minimax-bayes-rl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Process Monitoring · Supply Chain and Inventory Management