Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

Alexander Zimin; Rasmus Ibsen-Jensen; Krishnendu Chatterjee

arXiv:1405.0833·cs.LG·May 6, 2014·20 cites

Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

Alexander Zimin, Rasmus Ibsen-Jensen, Krishnendu Chatterjee

PDF

Open Access

TL;DR

This paper explores how to minimize regret in stochastic multi-armed bandits when the evaluation criterion is a general function of mean and variance, identifying conditions for learnability and limitations of algorithms.

Contribution

It characterizes conditions for learnability with general risk measures and shows that some functions prevent sublinear regret with natural algorithms.

Findings

01

Learning is possible under certain conditions on the risk function.

02

Some risk functions make sublinear regret impossible for natural algorithms.

03

The paper provides examples illustrating these limitations.

Abstract

We consider the problem of minimizing the regret in stochastic multi-armed bandit, when the measure of goodness of an arm is not the mean return, but some general function of the mean and the variance.We characterize the conditions under which learning is possible and present examples for which no natural algorithm can achieve sublinear regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Smart Grid Energy Management