Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Nicolas Galichet (LRI; INRIA Saclay - Ile de France); Mich\`ele Sebag; (LRI; INRIA Saclay - Ile de France); Olivier Teytaud (LRI; INRIA Saclay - Ile; de France)

arXiv:1401.1123·cs.LG·January 7, 2014·54 cites

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Nicolas Galichet (LRI, INRIA Saclay - Ile de France), Mich\`ele Sebag, (LRI, INRIA Saclay - Ile de France), Olivier Teytaud (LRI, INRIA Saclay - Ile, de France)

PDF

Open Access

TL;DR

This paper introduces MARAB, a risk-aware multi-armed bandit algorithm that balances exploration, exploitation, and safety by incorporating conditional value at risk, with theoretical and empirical validation.

Contribution

It presents the MARAB algorithm for risk-averse bandit problems and provides a theoretical analysis of the MIN algorithm, demonstrating robustness compared to UCB.

Findings

01

MARAB effectively limits risky exploration in bandit problems.

02

Theoretical analysis shows MIN's robustness over UCB under mild assumptions.

03

Experimental results validate MARAB and MIN's performance on artificial and real-world data.

Abstract

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics