Generalized Risk-Aversion in Stochastic Multi-Armed Bandits
Alexander Zimin, Rasmus Ibsen-Jensen, Krishnendu Chatterjee

TL;DR
This paper explores how to minimize regret in stochastic multi-armed bandits when the evaluation criterion is a general function of mean and variance, identifying conditions for learnability and limitations of algorithms.
Contribution
It characterizes conditions for learnability with general risk measures and shows that some functions prevent sublinear regret with natural algorithms.
Findings
Learning is possible under certain conditions on the risk function.
Some risk functions make sublinear regret impossible for natural algorithms.
The paper provides examples illustrating these limitations.
Abstract
We consider the problem of minimizing the regret in stochastic multi-armed bandit, when the measure of goodness of an arm is not the mean return, but some general function of the mean and the variance.We characterize the conditions under which learning is possible and present examples for which no natural algorithm can achieve sublinear regret.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Smart Grid Energy Management
