Wasserstein Distributionally Robust Policy Evaluation and Learning for   Contextual Bandits

Yi Shen; Pan Xu; Michael M. Zavlanos

arXiv:2309.08748·cs.LG·January 18, 2024·1 cites

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Yi Shen, Pan Xu, Michael M. Zavlanos

PDF

Open Access

TL;DR

This paper introduces a Wasserstein-based distributionally robust policy evaluation and learning method for contextual bandits, addressing environment mismatch issues more effectively than traditional KL-based approaches.

Contribution

It proposes a novel Wasserstein DRO framework with efficient optimization techniques and theoretical guarantees, improving robustness in off-policy evaluation and learning.

Findings

01

Wasserstein DRO outperforms KL-based methods in environment mismatch scenarios.

02

The proposed method achieves competitive policy evaluation accuracy.

03

Theoretical analysis confirms finite sample and iteration complexity bounds.

Abstract

Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Domain Adaptation and Few-Shot Learning