Balancing Immediate Revenue and Future Off-Policy Evaluation in Coupon Allocation
Naoki Nishimura, Ken Kobayashi, and Kazuhide Nakata

TL;DR
This paper introduces a mixed policy approach for coupon allocation that balances immediate revenue maximization with future policy improvement through off-policy evaluation, using a multi-objective optimization framework.
Contribution
It proposes a novel mixed policy combining deterministic and randomized strategies, and formulates the optimal mixture ratio as a multi-objective optimization problem.
Findings
Effective in balancing revenue and exploration in synthetic data experiments.
Flexible adjustment of data collection and revenue trade-off demonstrated.
Framework enables quantitative evaluation of the trade-off.
Abstract
Coupon allocation drives customer purchases and boosts revenue. However, it presents a fundamental trade-off between exploiting the current optimal policy to maximize immediate revenue and exploring alternative policies to collect data for future policy improvement via off-policy evaluation (OPE). To balance this trade-off, we propose a novel approach that combines a model-based revenue maximization policy and a randomized exploration policy for data collection. Our framework enables flexible adjustment of the mixture ratio between these two policies to optimize the balance between short-term revenue and future policy improvement. We formulate the problem of determining the optimal mixture ratio as multi-objective optimization, enabling quantitative evaluation of this trade-off. We empirically verified the effectiveness of the proposed mixed policy using synthetic data. Our main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGender, Labor, and Family Dynamics
