Generalised Entropy MDPs and Minimax Regret
Emmanouil G. Androulakis, Christos Dimitrakakis

TL;DR
This paper explores the use of minimax-Bayes policies in stochastic zero-sum games to address prior specification issues in Bayesian methods, extending bandit theory results for practical application.
Contribution
It introduces a generalized entropy framework for Markov Decision Processes and extends bandit theory to develop minimax-Bayes policies for worst-case prior scenarios.
Findings
Derived new minimax-Bayes policies for zero-sum games
Extended bandit theory results to generalised entropy MDPs
Discussed practical applicability of these policies
Abstract
Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Neural Networks and Applications · Model Reduction and Neural Networks
