Generalised Entropy MDPs and Minimax Regret

Emmanouil G. Androulakis; Christos Dimitrakakis

arXiv:1412.3276·cs.LG·December 11, 2014·1 cites

Generalised Entropy MDPs and Minimax Regret

Emmanouil G. Androulakis, Christos Dimitrakakis

PDF

Open Access

TL;DR

This paper explores the use of minimax-Bayes policies in stochastic zero-sum games to address prior specification issues in Bayesian methods, extending bandit theory results for practical application.

Contribution

It introduces a generalized entropy framework for Markov Decision Processes and extends bandit theory to develop minimax-Bayes policies for worst-case prior scenarios.

Findings

01

Derived new minimax-Bayes policies for zero-sum games

02

Extended bandit theory results to generalised entropy MDPs

03

Discussed practical applicability of these policies

Abstract

Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Neural Networks and Applications · Model Reduction and Neural Networks