Risk averse non-stationary multi-armed bandits

Leo Benac; Fr\'ed\'eric Godin

arXiv:2109.13977·cs.LG·September 30, 2021

Risk averse non-stationary multi-armed bandits

Leo Benac, Fr\'ed\'eric Godin

PDF

Open Access

TL;DR

This paper introduces methods for risk-averse decision-making in non-stationary multi-armed bandit problems using CVaR, proposing two estimation techniques and demonstrating their effectiveness through simulations.

Contribution

It presents two novel CVaR estimation methods for non-stationary bandits and integrates them into classic policies, improving risk-aware decision-making.

Findings

01

Proposed two CVaR estimation techniques for non-stationary losses.

02

Embedded these estimators into epsilon-greedy policies.

03

Showed improved performance over naive benchmarks in simulations.

Abstract

This paper tackles the risk averse multi-armed bandits problem when incurred losses are non-stationary. The conditional value-at-risk (CVaR) is used as the objective function. Two estimation methods are proposed for this objective function in the presence of non-stationary losses, one relying on a weighted empirical distribution of losses and another on the dual representation of the CVaR. Such estimates can then be embedded into classic arm selection methods such as epsilon-greedy policies. Simulation experiments assess the performance of the arm selection algorithms based on the two novel estimation approaches, and such policies are shown to outperform naive benchmarks not taking non-stationarity into account.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Smart Grid Energy Management