Risk-Averse No-Regret Learning in Online Convex Games
Zifan Wang, Yi Shen, Michael M. Zavlanos

TL;DR
This paper introduces a new risk-averse online learning algorithm for convex games that minimizes CVaR using bandit feedback, achieving sub-linear regret and demonstrating improved variants.
Contribution
The paper develops the first online risk-averse learning algorithm for convex games using CVaR with bandit feedback, including two variants with enhanced performance.
Findings
Achieves sub-linear regret with high probability.
Variants improve CVaR estimation accuracy and gradient variance reduction.
Demonstrates effectiveness on an online Cournot market game.
Abstract
We consider an online stochastic game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk measure that the agents can estimate using bandit feedback in the form of the cost values of only their selected actions. Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, they are themselves unknown and, therefore, the CVaR values of the costs are difficult to compute. To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. We show that this algorithm achieves sub-linear regret with high probability. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms
