Distributionally Robust Batch Contextual Bandits

Nian Si; Fan Zhang; Zhengyuan Zhou; Jose Blanchet

arXiv:2006.05630·cs.LG·September 13, 2023·1 cites

Distributionally Robust Batch Contextual Bandits

Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet

PDF

Open Access

TL;DR

This paper develops a distributionally robust policy learning framework for contextual bandits that accounts for environment shifts, providing theoretical guarantees and demonstrating improved robustness over standard methods in synthetic and real-world datasets.

Contribution

It introduces a novel policy evaluation and learning algorithm that is robust to distributional shifts and adversarial perturbations, with theoretical performance guarantees.

Findings

01

The proposed method outperforms standard algorithms in synthetic datasets under environment shifts.

02

The approach provides a reliable policy evaluation under worst-case distributional changes.

03

Empirical results show improved robustness in real-world voting data applications.

Abstract

Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning in Healthcare · Machine Learning and Algorithms