Conditionally Risk-Averse Contextual Bandits
M\'onika Farsang, Paul Mineiro, Wangda Zhang

TL;DR
This paper introduces the first risk-averse contextual bandit algorithm with an online regret guarantee, addressing the challenge of balancing exploration and risk sensitivity in diverse real-world scenarios.
Contribution
It presents a novel risk-averse contextual bandit algorithm with theoretical guarantees, suitable for applications requiring worst-case outcome avoidance.
Findings
Algorithm achieves regret guarantees in risk-averse settings.
Effective in dynamic pricing, inventory management, and software tuning.
Validated on exascale data processing systems.
Abstract
Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Auction Theory and Applications
