Group-Sensitive Offline Contextual Bandits

Yihong Guo; Junjie Luo; Guodong Gao; Ritu Agarwal; Anqi Liu

arXiv:2510.27123·cs.LG·January 7, 2026

Group-Sensitive Offline Contextual Bandits

Yihong Guo, Junjie Luo, Guodong Gao, Ritu Agarwal, Anqi Liu

PDF

Open Access

TL;DR

This paper introduces a fairness-aware offline policy optimization method for contextual bandits that reduces reward disparities across groups while maintaining overall reward performance.

Contribution

It proposes a constrained optimization framework incorporating group-wise reward disparity constraints with a doubly robust estimator and convergence guarantees.

Findings

01

Effectively reduces reward disparities in synthetic and real datasets.

02

Maintains competitive overall reward performance.

03

Provides convergence guarantees for the optimization process.

Abstract

Offline contextual bandits allow one to learn policies from historical/offline data without requiring online interaction. However, offline policy optimization that maximizes overall expected rewards can unintentionally amplify the reward disparities across groups. As a result, some groups might benefit more than others from the learned policy, raising concerns about fairness, especially when the resources are limited. In this paper, we study a group-sensitive fairness constraint in offline contextual bandits, reducing group-wise reward disparities that may arise during policy learning. We tackle the following common-parity requirements: the reward disparity is constrained within some user-defined threshold or the reward disparity should be minimized during policy optimization. We propose a constrained offline policy optimization framework by introducing group-wise reward disparity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Ethics and Social Impacts of AI