Risk-Aware Algorithms for Combinatorial Semi-Bandits

Shaarad Ayyagari; Ambedkar Dukkipati

arXiv:2112.01141·cs.LG·December 3, 2021

Risk-Aware Algorithms for Combinatorial Semi-Bandits

Shaarad Ayyagari, Ambedkar Dukkipati

PDF

Open Access

TL;DR

This paper introduces risk-aware algorithms for combinatorial semi-bandits that optimize the Conditional Value-at-Risk (CVaR), focusing on worst-case rewards, with theoretical regret bounds for Gaussian and bounded rewards.

Contribution

It presents the first algorithms and regret analysis for risk-aware combinatorial semi-bandits optimizing CVaR.

Findings

01

Algorithms for CVaR maximization in combinatorial bandits.

02

Regret bounds established for Gaussian and bounded rewards.

03

First theoretical analysis of risk-aware combinatorial semi-bandit problems.

Abstract

In this paper, we study the stochastic combinatorial multi-armed bandit problem under semi-bandit feedback. While much work has been done on algorithms that optimize the expected reward for linear as well as some general reward functions, we study a variant of the problem, where the objective is to be risk-aware. More specifically, we consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards. We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the combinatorial bandit for the two cases of Gaussian and bounded arm rewards. We further analyze these algorithms and provide regret bounds. We believe that our results provide the first theoretical insights into combinatorial semi-bandit problems in the risk-aware case.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems