Off-Policy Learning with Limited Supply

Koichi Tanaka; Ren Kishimoto; Bushun Kawagishi; Yusuke Narita; Yasuo Yamamoto; Nobuyuki Shimizu; Yuta Saito

arXiv:2603.18702·cs.LG·May 19, 2026

Off-Policy Learning with Limited Supply

Koichi Tanaka, Ren Kishimoto, Bushun Kawagishi, Yusuke Narita, Yasuo Yamamoto, Nobuyuki Shimizu, Yuta Saito

PDF

TL;DR

This paper addresses off-policy learning in contextual bandits with supply constraints, proposing a new method called OPLS that improves item allocation efficiency under limited resources.

Contribution

The paper introduces OPLS, a novel off-policy learning approach tailored for limited supply scenarios, with theoretical analysis and empirical validation showing its superiority.

Findings

01

OPLS outperforms existing methods on synthetic datasets.

02

Conventional greedy approaches may fail under supply constraints.

03

Theoretical analysis highlights the importance of considering supply limitations.

Abstract

We study off-policy learning (OPL) in contextual bandits, which plays a key role in a wide range of real-world applications such as recommendation systems and online advertising. Typical OPL in contextual bandits assumes an unconstrained environment where a policy can select the same item infinitely. However, in many practical applications, including coupon allocation and e-commerce, limited supply constrains items through budget limits on distributed coupons or inventory restrictions on products. In these settings, greedily selecting the item with the highest expected reward for the current user may lead to early depletion of that item, making it unavailable for future users who could potentially generate higher expected rewards. As a result, OPL methods that are optimal in unconstrained settings may become suboptimal in limited supply settings. To address the issue, we provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.