Risk-Aversion in Multi-armed Bandits

Amir Sani (INRIA Lille - Nord Europe); Alessandro Lazaric (INRIA Lille; - Nord Europe); R\'emi Munos (INRIA Lille - Nord Europe)

arXiv:1301.1936·cs.LG·January 10, 2013·92 cites

Risk-Aversion in Multi-armed Bandits

Amir Sani (INRIA Lille - Nord Europe), Alessandro Lazaric (INRIA Lille, - Nord Europe), R\'emi Munos (INRIA Lille - Nord Europe)

PDF

Open Access

TL;DR

This paper introduces a risk-averse multi-armed bandit setting focusing on optimizing risk-return trade-offs rather than expected reward, proposing new algorithms and analyzing their theoretical and empirical performance.

Contribution

It presents a novel risk-averse bandit framework, develops two algorithms tailored for variance-based risk, and provides theoretical guarantees along with preliminary empirical results.

Findings

01

New risk-averse bandit setting based on variance

02

Two algorithms with theoretical guarantees

03

Preliminary empirical results demonstrating effectiveness

Abstract

Stochastic multi-armed bandits solve the Exploration-Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, investigate their theoretical guarantees, and report preliminary empirical results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications