Group-Sensitive Offline Contextual Bandits
Yihong Guo, Junjie Luo, Guodong Gao, Ritu Agarwal, Anqi Liu

TL;DR
This paper introduces a fairness-aware offline policy optimization method for contextual bandits that reduces reward disparities across groups while maintaining overall reward performance.
Contribution
It proposes a constrained optimization framework incorporating group-wise reward disparity constraints with a doubly robust estimator and convergence guarantees.
Findings
Effectively reduces reward disparities in synthetic and real datasets.
Maintains competitive overall reward performance.
Provides convergence guarantees for the optimization process.
Abstract
Offline contextual bandits allow one to learn policies from historical/offline data without requiring online interaction. However, offline policy optimization that maximizes overall expected rewards can unintentionally amplify the reward disparities across groups. As a result, some groups might benefit more than others from the learned policy, raising concerns about fairness, especially when the resources are limited. In this paper, we study a group-sensitive fairness constraint in offline contextual bandits, reducing group-wise reward disparities that may arise during policy learning. We tackle the following common-parity requirements: the reward disparity is constrained within some user-defined threshold or the reward disparity should be minimized during policy optimization. We propose a constrained offline policy optimization framework by introducing group-wise reward disparity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Ethics and Social Impacts of AI
