Minimax-Bayes Reinforcement Learning
Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya, Grover, Emilio Jorge

TL;DR
This paper explores minimax-Bayes solutions in reinforcement learning, showing that worst-case priors lead to more robust policies compared to standard priors, providing insights into decision-making under uncertainty.
Contribution
It introduces and analyzes minimax-Bayes approaches for reinforcement learning, highlighting their robustness and properties of the resulting priors and policies.
Findings
Minimax-Bayes policies are more robust than standard prior policies.
Worst-case priors vary depending on the setting.
Insights into the properties of priors and policies in RL.
Abstract
While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring · Supply Chain and Inventory Management
