Risk-Averse Stochastic Convex Bandit
Adrian Rivera Cardoso, Huan Xu

TL;DR
This paper introduces algorithms for risk-averse online convex optimization with bandit feedback, addressing a novel problem in the field, and provides methods with strong theoretical regret bounds.
Contribution
It is the first to incorporate risk-aversion into the online convex bandit setting, proposing two algorithms including an easy-to-implement descent method and an ellipsoid-based approach.
Findings
The descent-type algorithm is simple and practical.
The ellipsoid-based algorithm achieves near-optimal regret bounds.
This work pioneers risk-averse strategies in online convex bandit problems.
Abstract
Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Risk and Portfolio Optimization
