Risk-Averse Stochastic Convex Bandit

Adrian Rivera Cardoso; Huan Xu

arXiv:1810.00737·cs.LG·October 2, 2018·1 cites

Risk-Averse Stochastic Convex Bandit

Adrian Rivera Cardoso, Huan Xu

PDF

Open Access

TL;DR

This paper introduces algorithms for risk-averse online convex optimization with bandit feedback, addressing a novel problem in the field, and provides methods with strong theoretical regret bounds.

Contribution

It is the first to incorporate risk-aversion into the online convex bandit setting, proposing two algorithms including an easy-to-implement descent method and an ellipsoid-based approach.

Findings

01

The descent-type algorithm is simple and practical.

02

The ellipsoid-based algorithm achieves near-optimal regret bounds.

03

This work pioneers risk-averse strategies in online convex bandit problems.

Abstract

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Risk and Portfolio Optimization