Conditionally Risk-Averse Contextual Bandits

M\'onika Farsang; Paul Mineiro; Wangda Zhang

arXiv:2210.13573·stat.ML·July 11, 2023

Conditionally Risk-Averse Contextual Bandits

M\'onika Farsang, Paul Mineiro, Wangda Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first risk-averse contextual bandit algorithm with an online regret guarantee, addressing the challenge of balancing exploration and risk sensitivity in diverse real-world scenarios.

Contribution

It presents a novel risk-averse contextual bandit algorithm with theoretical guarantees, suitable for applications requiring worst-case outcome avoidance.

Findings

01

Algorithm achieves regret guarantees in risk-averse settings.

02

Effective in dynamic pricing, inventory management, and software tuning.

03

Validated on exascale data processing systems.

Abstract

Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zwd-ms/risk_averse_cb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Auction Theory and Applications